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CELLULASES, NUCLEIC ACIDS ENCODING THEM 
AND METHODS FOR MAKING AND USING THEM 

GOVERNMENT SUPPORT 
This invention was made with United States Government support under Contract 
Nos. DE-FG03-02ER83395 and DE-FG02-03ER83865, awarded by tiie Department of 
Energy. The Govermnent has certain rights in this invention. 

FIELD OF THE INVENTION 
This invention relates to molecular and cellular biology and biocheroistry. In one 
aspect, the invention provides polypeptides having cellulase activity, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or P-glucosidase activity, polynucleotides encoding 
these polypeptides, and methods of making and using these polynucleotides and 
polypeptides. In one aspect, the invention is directed to polypeptides having cellulase 
activity, e.g., endoglucanase, cellobiohydrolase, mannanase and/or P-glucosidase activity, 
hicluding thermostable and thermotolerant activity, and polynucleotides encoding these 
enzymes, and making and using these polynucleotides and polypeptides. The 
polypeptides of the invention can be used in a variety of pharmaceutical, agricultural and 
industrial contexts. 

BACKGROUND 

Cellulose is the most abundant renewable resource on earth. It is composed of a 
linear chain of P 1-4 glucose units with the repeating unit being cellobiose, which is a 
glucose dimer having a structure as shown in Figure 5. The polymer is degraded by a 
suite of enzymes which include endoglucanases (EG) which randomly hydrolyze the 
cellulose polymer, and cellobiohydrolases (CBH) which remove terminal cellobiose 
residues from cellulose. Cellobiose and cello-oligosacchaiides are hydrolyzed to glucose 
by P-glucosidases (BG). All three of these enzymes are necessary for the complete 
breakdown of cellulose to glucose. For each of these three enzymes different structural 
variants exist that perform the same function. In addition, fungi and bacteria are known 
to produce multiple forms of the same structural variants in addition to different structural 
variants. 

Further complicating this system is the fact that some anaerobic bacteria and fungi 
are known to produce these enzymes in multi-enzyme complexes which contain multiple 
enzymes all attached to an enzyme scaffold with molecular weights above 2 million 
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daltons. Why is such a complex system of enzymes necessary for such a simple 
molecule? Some researchers beheve that this complexity is due to the recalcitrant nature 
of the substrate. The cellulose chains form microfibrils that pack into a crystalline matrix 
via hydrogen bonding of adjacent chains. This structure is highly resistant to chemical or 

5 enzymatic degradation. 

CBHs are thought to be the key enzyme in the degradation of this crystalline 
cellulose because of the nature of their enzymatic attack on cellulose. EGs unlike CBHs 
have an open cleft that attacks the cellulose chain at a perpendicular angle. CBHs attack 
the chain directly via a tuimel containing the active site. The current thought is that the 

10 cellulose chains enter the tunnel and at the same time, adjacent hydrogen bonding is 

disrupted. Once the cellobiohydrolases have established this "foothold" on the substrate, 
the EGs can then come in and more readily attack the substrate. 

A major deficiency of known CBHs is their low catalytic activity. Some groups 
argue that the low activity stems from the fact that energy from hydrolysis is transferred 

15 to kinetic energy to disrupt hydrogen bonds and enable the enzyme to move along the 

substrate. CBHs are exo-acting enzymes and are foimd in 6 of the 90 families of glycosyl 
hydrolases. They include famiHes 5, 6, 7, 9, 10 and 48. Family 5 contains many different 
types of glycosyl hydrolases including cellulases, mannanases and xylanases. Although 
most cellulases in this family are endoglucanases, there are examples of 

20 cellobiohydrolases, most notably CelO fi-om Clostridium thermocellum. Family 6 
contains only endoglucanases or cellobiohydrolases with more cellobiohydrolase 
members than endoglucanases. The enzymes have an inverting mechanism and 
crystallographic studies suggest that the enzyme has a distorted oc/p barrel structure 
containing seven, not eight parallel P-strands. Family 7 enzymes are also composed of 

25 both endoglucanases and cellobiohydrolases with more cellobiohydrolases and only 

known members are from fungi. The enzyme has a retaining mechanism and the crystal 
structure suggests a P-jellyi'oll structure. Family 9 contains endoglucanases, 
cellobiohydrolases and P-glucosidases with a preponderance of endoglucanases. 
However, Thermobifida fusca produces an endo/exo4,4-glucanase, the crystal structure 

30 of which suggests a {oJa)e barrel fold. The enzyme has characteristics of both endo and 
exo-glucanases CBHs. Family 10 contains only 2 members described as 
cellobiohydrolases with mainly the rest described as xylanases. Cellobiohydrolases and 
xylanases from family 10 have activity on methyl-umbelliferyl cellobioside. Family 48 
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contains mainly bacterial and anaerobic fungal cellobiohydrolases and endoglucanases. 
The structure is a (aJa)e barrel fold similar to family 9. 

There is a need for less expensive and renewable sources of fuel for road vehicles. 
New fuel sources will be more attractive if they produce nonharmful endproducts after 

5 combustion. Ethanol offers an attractive alternative to petrolemn based fuels and can be 
obtained through the fermentation of monomeric sugars derived from starch or 
Hgnocellulose. However, current economics do not support the widespread use of ethanol 
due to the high cost of generating it. One area of research aimed at decreasing costs is 
enhancement of the technical efficacy of the enzymes that can be used to generate 

10 fermentable sugars from Hgnocellulose. The development of enzymes that more 

efficiently digest feedstock will translate to decreased ethanol production costs. More 
efficient processes will decrease the United State's reliance on foreign oil and the price 
fluctuations that may be related to that reliance. Using cleaner fuels for transportation 
like bioethanol also may decrease net CO2 emissions that are beheved to be partially 

15 responsible for global warming. 

SUMMARY 

The invention provides cellulases, e.g., endoglucanases, cellobiohydrolases and/or 
P-glucosidase (beta-glucosidases), and methods for making and using them. In one 
aspect, the enzymes of the invention have an increased catalytic rate to improve the 

20 process of substrate hydrolysis. This increased efficiency in catalytic rate leads to an 

increased efficiency in producing sugars, which can be useful in industrial appUcations, 
e.g., the sugars so produced can be used by microorganisms for ethanol production. La 
one aspect, the invention provides highly active (e.g., having an increased catalytic rate) 
cellobiohydrolases, endoglucanases and beta-glucosidase. The invention provides 

25 industrial applications (e.g., biomass to ethanol) using enzymes of the invention having 
decreased enzyme costs, e.g., decreased costs in biomass to ethanol processes. Thus, the 
invention provides efficient processes for producing bioethanol and bioethanol- 
comprising compositions, including fuels comprising bioethanol, from any biomass. 
In one aspect, the enzymes of the invention have a glucanase, e.g., an 

30 endoglucanase, activity, e.g., catalyzing hydrolysis of internal endo- p-1,4- and/or P-1,3- 
glucanase linkages. In one aspect, the endoglucanase activity (e.g., endo-l,4-beta-D- 
glucan 4-glucano hydrolase activity) comprises hydrolysis of 1,4- and/or p-1,3- beta-D- 
glycosidic linkages in cellxilose, cellulose derivatives (e.g., carboxy methyl cellulose and 
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hydroxy ethyl cellulose) lichenin, beta- 1,4 bonds in mixed beta- 1,3 glucans, such as 
cereal beta-D-glucans or xyloglucans and other plant material containing cellulosic parts. 

In one aspect, the enzymes of the invention have endoglucanase (e.g., endo-beta- 
1,4-glucanases, EC 3.2.1.4; endo-beta-l,3(l)-glucanases, EC 3.2.1.6; endo-beta-1,3- 

5 glucanases, EC 3.2.1.39) activity and can hydrolyze internal P-1,4- and/or p-1,3- 

glucosidic linkages in cellulose and glucan to produce smaller molecular weight glucose 
and glucose oligomers. The invention provides methods for producing smaller molecular 
weight glucose and glucose oligomers using these enzymes of the invention. 

In one aspect, the enzymes of the invention are used to generate glucans, e.g., 

10 polysaccharides formed from 1,4-^- and/or 1,3-glycoside-linked D-glucopyranose. In one 
aspect, the endoglucanases of the invention are used in the food industry, e.g., for baking 
and fruit and vegetable processing, breakdown of agricultural waste, in the manufacture 
of animal feed, in pulp and paper production, textile manufacture and household and 
industrial cleaning agents. In one aspect, the enzymes, e.g., endoglucanases, of the 

15 invention are produced by a microorganism, e.g., by a frmgi and/or a bacteria. 

In one aspect, the enzymes, e.g., endoglucanases, of the invention are used to 
hydrolyze beta-glucans (^-glucans) which are major non-starch polysaccharides of 
cereals. The glucan content of a polysaccharide can vary significantly depending on 
variety and growth conditions. The physicochemical properties of this polysaccharide are 

20 such that it gives rise to viscous solutions or even gels under oxidative conditions. In 
addition glucans have high water-binding capacity. All of these characteristics present 
problems for several industries including brewing, baking, animal nutrition. In brewing 
applications, the presence of glucan results in wort filterability and haze formation issues. 
In baking appHcations (especially for cookies and crackers), glucans can create sticky 

25 doughs that are difficult to machine and reduce biscuit size. Thus, the enzymes, e.g., 
endoglucanases, of the invention are used to decrease the amount of p-glucan in a P- 
glucan-comprising composition, e.g., ensymes of the invention are used in processes to 
decrease the viscosity of solutions or gels; to decrease the water-binding capacity of a 
composition, e.g., a P-glucan-comprismg composition; in brewing processes (e.g., to 

30 increase wort filterability and decrease haze formation), to decrease the stickiness of 
doughs, e.g., those for making cookies, breads, biscuits and the like. 

In addition, carbohydrates (e.g., p-glucan) are implicated in rapid rehydration of 
baked products resulting in loss of crispiness and reduced shelf-life. Thus, the enzymes, 
e.g., endoglucanases, of the invention are used to retain crispiness, increase crispiness, or 
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reduce the rate of loss of crispiness, and to increase the shelf-hfe of any carbohydrate- 
comprising food, feed or drink, e.g., a p-glucan-comprising food, feed or drink. 

Enzymes, e.g., endoglucanases, of the invention are used to decrease the viscosity 
of gut contents (e.g., in animals, such as ruminant animals, or hxmians), e.g., those with 

5 cereal diets. Thus, in alternative aspects, enzymes, e.g., endoglucanases, of the invention 
are used to positively affect the digestibility of a food or feed and animal (e.g., human or 
domestic animal) growth rate, and in one aspect, are used to higher generate feed 
conversion efficiencies. For monogastric animal feed applications with cereal diets, beta- 
glucan is a contributing factor to viscosity of gut contents and thereby adversely affects 

10 the digestibility of the feed and animal growth rate. For ruminant animals, these beta- 

glucans represent substantial components of fiber intake and more complete digestion of 
glucans would facilitate higher feed conversion efficiencies. Accordingly, the invention 
provides animal feeds and foods comprising endoglucanases of the invention, and in one 
aspect, these enzymes are active in an animal digestive tract, e.g., in a stomach and/or 

15 intestine. 

Enzymes, e.g., endoglucanases, of the invention are used to digest cellulose or any 
beta-l,4-linked glucan-comprising synthetic or natural material, including those fo\md in 
any plant material. Enzymes, e.g., endoglucanases, of the invention are used as 
commercial enzymes to digest cellulose, e.g., in the wood processhig, pulp and/or paper 

20 industry, in textile manufacture and in household and industrial cleaning agents, and/or in 
biomass waste processing. 

In one aspect the invention provides compositions (e.g., pharmaceutical 
compositions, foods, feeds, drugs, dietary supplements) comprising the enzymes, 
polypeptides or polynucleotides of the invention. These compositions can be formulated 

25 in a variety of forms, e.g., as tablets, gels, pills, implants, liquids, sprays, powders, food, 
feed pellets or as any type of encapsulated form. 

The invention provides isolated or recombinant nucleic acids comprising a nucleic 
acid sequence having at least about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 
59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 

30 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 
89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, or complete 
(100%) sequence identity to an exemplary nucleic acid of the invention, including SEQ 
ID NO: 1, SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO: 1 1 , 
SEQ ID N0:13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID N0:19, SEQ ID NO:21, SEQ 
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ID NO:23, SEQ ID NO:25, SEQ ID NO:27, SEQ ID NO:29, SEQ ID NO:3 1, SEQ ID 
NO:33, SEQ ID NO:35, SEQ ID NO:37, SEQ ID NO:39, SEQ ID NO:41, SEQ ID 
NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:49, SEQ ID N0:51, SEQ ID 
NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, SEQ ID NO:61, SEQ ID 

5 NO:63, SEQ ID NO:65, SEQ ID NO:67, SEQ ID NO:69, SEQ ID N0:7I, SEQ ID 
NO:73, SEQ ID NO:75, SEQ ID NO:77, SEQ ID NO:79, SEQ ID N0:81, SEQ ID 
NO:83, SEQ ID NO:85, SEQ ID NO:87, SEQ ID NO:89, SEQ ID N0:91, SEQ ID 
NO:93, SEQ ID NO:95. SEQ ID NO:97, SEQ ID NO:99, SEQ ID NO: 101, SEQ ID 
NO:103, SEQ ID NO:105, SEQ ID NO:107, SEQ ID NO:109, SEQ ID N0:1 11, SEQ ID 

10 N0:113, SEQ IDN0:115, SEQ ID NO:117, SEQ ID NO:l 19, SEQ IDNO:121, SEQ ID 
NO:123, SEQIDNO:125, SEQIDNO:127, SEQIDNO:129, SEQ IDN0:131, SEQ ID 
NO:133, SEQ ID NO:135, SEQ ID NO:137, SEQ ID NO:139, SEQ ID N0:141, SEQ ID 
NO:143, SEQ ID NO:145, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID 
NO:153, SEQ ID NO:155, SEQ ID NO:157, SEQ ID NO:159, SEQ ID NO:161, SEQ ID 

15 NO:163 and SEQ ID NO:165; see also Tables 1, 2, and 3, Examples 1 and 4, below, and 
Sequence Listing, over a region of at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 75, 
100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 
1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 
1700, 1750, 1800, 1850, 1900, 1950, 2000, 2050, 2100, 2200, 2250, 2300, 2350, 2400, 

20 2450, 2500, or more residues; and in alternative aspects, these nucleic acids encode at 
least one polypeptide having a cellulase activity, e.g., endoglucanase, cellobiohydrolase, 
maimanase and/or beta-glucosidase activity, or encode a polypeptide capable of 
generating an antibody that can specifically bind to a polypeptide of the invention, or, 
these nucleic acids can be used as probes for identifying or isolating cellulase-encoding 

25 nucleic acids, or to inhibit the expression of cellulase-expressing nucleic acids (all these 
aspects referred to as the "nucleic acids of the invention"). In one aspect, the sequence 
identities are determined by analysis with a sequence comparison algorithm or by a visual 
inspection. 

Nucleic acids of the invention also include isolated or recombinant nucleic acids 
30 encoding an exemplary enzyme of the invention, including a polypeptide having a 

sequence as set forth in SEQ ID N0:2, SEQ ID N0:4, SEQ ID NO:6, SEQ ID NO:8, 
SEQ ID NO:10, SEQ ID NO: 12, SEQ ED N0:14, SEQ ID N0:16, SEQ ID N0:18, SEQ 
ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID 
NO:30, SEQ ID NO:32, SEQ ID NO:34, SEQ ID NO:36, SEQ ID NO:38, SEQ ID 
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NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48. SEQ ID 
NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID 
NO:60, SEQ ID NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ ID NO:68, SEQ ID 
NO:70, SEQ ID NO:72, SEQ ID NO:74, SEQ ID NO:76, SEQ ID NO:78, SEQ ID 

5 NO:80, SEQ ID NO:82, SEQ ID NO:84, SEQ ID NO:86, SEQ ID NO:88, SEQ ID 
NO:90, SEQ ID NO:92, SEQ ID NO:94, SEQ ID NO:96, SEQ ID NO:98, SEQ ID 
NO:100, SEQ IDNO:102, SEQ IDNO:104, SEQ IDNO:106, SEQ ID NO:108, SEQ ID 
N0:1 10, SEQ ID NO: 11 2, SEQ ID N0:1 14, SEQ ID NO:l 16, SEQ ID NO: 11 8, SEQ ID 
NO:120, SEQ ID NO: 122, SEQ ID NO: 124, SEQ ID NO:126, SEQ ID NO:128, SEQ ID 

10 NO:130, SEQ ID NO:132, SEQ ID NO:134, SEQ ED NO:136, SEQ ID NO:138, SEQ ID 
NO:140, SEQ IDNO:142, SEQ IDNO:143, SEQ IDNO:146, SEQ ID NO:148, SEQ ID 
NO:150, SEQIDNO:152, SEQIDNO:154, SEQIDNO:156, SEQ ID NO: 158, SEQ ID 
NO: 160, SEQ ID NO: 162, SEQ ID NO: 164 and SEQ ID NO: 166, see also Tables 1, 2, 
and 3, Examples 1 and 4, below, and the Sequence Listing, and subsequences thereof and 

15 variants fliereof. In one aspect, the polypeptide has a cellulase activity, e.g., 

endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase activity. 

In one aspect, the invention provides cellulase-encoding, e.g., endoglucanase-, 
cellobiohydrolase- and/or beta-glucosidase-encoding nucleic acids having a common 
novelty in that they are derived from mixed cultures. The invention provides cellulose- 

20 degrading enzyme-encoding nucleic acids isolated from mixed cultures comprising a 
polynucleotide of the invention, e.g., a sequence having at least about 10%, 15%, 20%, 
25%, 30%, 35%, 40%, 45%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 
60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 
75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 

25 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, or complete (100%) 
sequence identity to an exemplary nucleic acid of the invention, e.g., SEQ ID NO:l, SEQ 
ID NO:3, SEQ ID N0:5, SEQ ID N0:7, SEQ ID NO:9, SEQ ID NO:l 1, SEQ ID NO:13, 
SEQ ID N0:15, SEQ ID NO:17, SEQ ID N0:19, SEQ ID N0:21, SEQ ID NO:23, SEQ 
IDNO:25, SEQIDNO:27, SEQIDNO:29, SEQIDNO:31, SEQIDNO:33, SEQ ID 

30 NO:35, SEQ ID NO:37, SEQ ID NO:39, SEQ ID N0:41, SEQ ID NO:43, SEQ ID 
NO:45, SEQ ID NO:47, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:53, SEQ ID 
NO:55, SEQ ID NO:57, SEQ ID NO:59, SEQ ID N0:61, SEQ ID NO:63, SEQ ID 
NO:65, SEQ ID NO:67, SEQ ID NO:69, SEQ ED N0:71, SEQ ID NO:73, SEQ ID 
NO:75, SEQ ID NO:77, SEQ ID NO:79, SEQ ID NO:81, SEQ ID NO:83, SEQ ID 
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NO:85, SEQ ID NO:87, SEQ ID NO:89, SEQ ID N0:91, SEQ IDNO:93, SEQ ID 
NO:95, SEQ ID NO;97, SEQ ID NO:99, SEQ ID NO:101, SEQ ID NO: 103. SEQ ID 
NO:105, SEQ IDNO:107, SEQIDNO:109, SEQIDN0:111, SEQIDN0:113, SEQ ID 
NO: 1 15, SEQ ID NO: 117, SEQ ID NO: 1 19, SEQ ID NO:121, SEQ ID NO:123, SEQ ID 

5 NO: 125, SEQ ID NO: 127. SEQ ID NO: 129, SEQ ID N0:13 1, SEQ ID NO: 133. SEQ ID 
NO:135, SEQIDNO:137, SEQIDNO:139, SEQIDNO:141, SEQIDNO:143, SEQ ID 
NO:145, SEQ ED NO:147, SEQ IDNO:149, SEQ ID N0:151, SEQ ID NO:153, SEQ ID 
N0:155, SEQ IDNO:157, SEQ IDNO:159, SEQ IDNO:161, SEQ IDNO:163 and SEQ 
ID NO: 165, and see Tables 1, 2, and 3, Examples 1 and 4, below, and Sequence Listing, 

10 over a region of at least about 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 
600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, or more. 

In one aspect, the invention provides cellulase enzyme- , e.g., endoglucanase 
enzyme-, cellobiohydrolase enzyme- and/or beta-glucosidase enzyme-encoding nucleic 
acids, including exemplary polynucleotide sequences of the invention, see also Tables 1, 

15 2, and 3, Examples 1 and 4, below, and Sequence Listing, and the polypeptides encoded 
by them, including enzymes of the invention, e.g., exemplary polypeptides of the 
invention, e.g., SEQ ID NO:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID N0:8, SEQ ID 
NO:10, SEQ ID NO:12, SEQ ID N0:14, SEQ ID NO:16, SEQ ID N0:18, SEQ ID 
NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID N0:26, SEQ ID NO:28, SEQ ID 

20 NO:30, SEQ ID NO:32, SEQ ID NO:34, SEQ ID NO:36, SEQ ID NO:38, SEQ ID 
NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID 
NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID 
NO:60, SEQ ID NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ ID NO:68, SEQ ED 
NO:70, SEQ ID NO:72, SEQ ID NO:74, SEQ ID NO:76, SEQ ID NO:78, SEQ ID 

25 NO:80, SEQ ID NO:82, SEQ ID NO:84, SEQ ID NO:86, SEQ ID NO:88, SEQ ID 
NO:90, SEQ ID NO:92, SEQ ID NO:94, SEQ ID NO:96, SEQ ID NO:98, SEQ ID 
NO: 100, SEQ ID NO: 102, SEQ ID NO: 104, SEQ ID NO: 106, SEQ ID NO:108, SEQ ID 
NO: 1 10, SEQ ID NO: 1 12, SEQ ID NO: 1 14, SEQ ID NO:l 1 6, SEQ ID NO: 118, SEQ ID 
NO:120, SEQ IDNO:122, SEQ IDNO:124, SEQ IDNO:126, SEQ IDNO:128, SEQ ID 

30 NO:130, SEQ IDNO:132, SEQ IDNO:134, SEQ ID NO:136, SEQ ID NO:138, SEQ ID 
NO: 140, SEQ ID NO: 142, SEQ ID NO: 143, SEQ LD NO: 146, SEQ ID NO:148, SEQ ID 
NO: 1 50, SEQ ID NO: 1 52, SEQ ID NO: 154, SEQ ID NO: 156, SEQ ID NO: 158, SEQ ID 
NO: 1 60, SEQ ID NO: 1 62, SEQ ID NO: 1 64 or SEQ ID NO: 166, see also Table 1 and 
Sequence Listing, having a common novelty in that they are derived from a common 
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source, e.g., an environmental source. In one aspect, the invention also provides cellulase 
enzyme- , e.g., endoglucanase enzyme-, cellobiohydrolase enzyme- and/or beta- 
glucosidase enzyme-encoding nucleic acids with a common novelty in that they are 
derived from environmental sources, e.g., mixed environmental sources. 

In one aspect, the sequence comparison algorithm is a BLAST version 2.2.2 
algorithm where a filtering setting is set to blastall -p blastp -d "nr pataa" -F F, and all 
other options are set to default. 

Another aspect of the invention is an isolated or recombinant nucleic acid 
including at least 10, 15, 20, 25, 30, 35, 40, 45, 50, 75, 100, 150, 200, 250, 300, 350, 400, 
450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 
1250, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 
1950, 2000, 2050, 2100, 2200, 2250, 2300, 2350, 2400, 2450, 2500, or more consecutive 
bases of a nucleic acid sequence of the invention, sequences substantially identical 
thereto, and the sequences complementary thereto. 

In one aspect, the isolated or recombinant nucleic acid encodes a polypeptide 
having a cellulase activity, e.g., endoglucanase, cellobiohydrolase, mannanase and/or 
beta-glucosidase activity, which is thermostable. The polypeptide can retain a cellulase 
activity under conditions comprising a tempemture range of between about 37°C to about 
95°C; between about 55^C to about 85°C, between about 70**C to about 95°C, or, between 
about 90°C to about 95°C, The polypeptide can retain a cellulase activity in temperatures 
in the range between about TC to about 5''C, between about 5°C to about 15°C, between 
about 15^C to about 25*^0, between about 25°C to about 37^C, between about 37°C to 
about 95°C, 96°C, 97°C, 98°C or 99**C, between about 55°C to about 85°C, between about 
70°C to about 75^*0, or between about 90°C to about 99°C, or 95*^0, 96°C, 9TC, 98°C or 
99°C,ormore, 

In another aspect, the isolated or recombinant nucleic acid encodes a polypeptide 
having a cellulase activity, e.g., endoglucanase, cellobiohydrolase, mannanase and/or 
beta-glucosidase activity, which is themiotolerant. The polypeptide can retain a cellulase 
activity after exposure to a temperature in the range from greater than 37°C to about 95°C 
or anywhere in the range from greater than 55^C to about 85**C. The polypeptide can 
retain a ceUulase activity after exposm-e to a temperature in the range between about l°C 
to about 5°C, -between about 5**C to about 15°C, between about 15^C to about 25**C, 
between about 25°C to about 37°C, between about 37**C to about 95^C, 96°C, 97''C, 98**C 
or 99'*C, between about 55*'C to about 85°C, between about 70°C to about 75**C, or 
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between about 90°C to about 95°C, or more. In one aspect, the polypeptide retains a 
cellulase activity after exposvire to a temperature in the range from greater than 90°C to 
about 99°C, or 95''C, 96°C, 9TC, 98°C or 99°C, at about pH 4.5, or more. 

The invention provides isolated or recombinant nucleic acids comprising a 
sequence that hybridizes under stringent conditions to a nucleic acid of the invention, 
including an exemplary sequence of the invention, e.g., a sequence as set forth in SEQ ID 
NO:l, SEQ ID NO:3, SEQ ID NO:5, SEQ ID N0:7, SEQ ID N0:9, SEQ ID NO: 11, SEQ 
ID NO: 13, SEQ ID NO: 15, SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID N0:21, SEQ ID 
NO:23, SEQ ID NO:25, SEQ ID NO:27, SEQ ID NO:29, SEQ ID NO:3 1 , SEQ ID 
NO:33, SEQ ID NO:35, SEQ ID NO:37, SEQ ID NO:39, SEQ ID N0:41, SEQ ID 
NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:49, SEQ ID NO:51, SEQ ID 
NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, SEQ ID N0:61, SEQ ID 
NO:63, SEQ ID NO:65, SEQ ID N0:67, SEQ ID NO:69, SEQ ID N0:71, SEQ ID 
NO:73, SEQ ID NO:75, SEQ ID NO:77, SEQ ID NO:79, SEQ ID N0:81, SEQ ID 
NO:83, SEQ ID NO:85, SEQ ID NO:87, SEQ ID NO:89, SEQ ID N0:91, SEQ ID 
NO:93, SEQ ID NO:95, SEQ ID NO:97, SEQ ID NO:99, SEQ ID NO:101, SEQ ID 
NO:103, SEQ IDNO:105, SEQIDNO:107, SEQIDNO:109, SEQIDNOrlll, SEQ ID 
NO:113, SEQ IDNO:115, SEQ IDN0:117, SEQ IDNO:119, SEQ ID NO:121, SEQ ID 
NO:123, SEQ rDNO:125, SEQ ID NO:127, SEQ IDNO:129, SEQ ID NO:131, SEQ ID 
NO: 133, SEQ IDNO:135, SEQ ID NO:137, SEQ ID NO:139, SEQ ID NO:141, SEQ ID 
NO:143, SEQ IDNO:145, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID 
NO:153, SEQ IDNO:155, SEQ ID NO:157, SEQ IDNO:159, SEQ ID NO:161, SEQ ID 
NO:163 or SEQ IDNO:165 (see also Tables 1, 2, and 3, Examples 1 and 4, below,), or 
fragments or subsequences thereof. In one aspect, the nucleic acid encodes a polypeptide 
having a cellulase activity, e.g., endoglucanase, cellobiohydrolase, mannanase and/or 
beta-glucosidase activity. The nucleic acid can be at least about 10, 15, 20, 25, 30, 35, 
40, 45, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 
850, 900, 950, 1000, 1050, 1 100, 1 150, 1200 or more residues in length or the full length 
of the gene or transcript. In one aspect, the stringent conditions comprise a wash step 
comprising a wash in 0.2X SSC at a temperature of about 65°C for about 15 minutes. 

The invention provides a nucleic acid probe for identifying or isolating a nucleic 
acid encoding a polypeptide having a cellulase activity, e.g., endoglucanase, 
cellobiohydrolase, majmanase and/or beta-glucosidase activity, wherein the probe 
comprises at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 
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95, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 
950, 1000 or more, consecutive bases of a sequence comprising a sequence of the 
invention, or fragments or subsequences thereof, wherein the probe identifies the nucleic 
acid by binding or hybridization. The probe can comprise an oligonucleotide comprising 

5 at least about 10 to 50, about 20 to 60, about 30 to 70, about 40 to 80, or about 60 to 100 
consecutive bases of a sequence comprising a sequence of the invention, or fragments or 
subsequences thereof. 

The invention provides a nucleic acid probe for identifying or isolating a nucleic 
acid encoding a polypeptide having a cellulase activity, e.g., endoglucanase, 

10 ceUobiohydrolase, maunanase and/or beta-glucosidase activity, wherein the probe 

comprises a nucleic acid comprising a sequence at least about 10, 15, 20, 30, 40, 50, 60, 
70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 
850, 900, 950, 1000 or more residues of a nucleic acid of the invention, e.g., a 
polynucleotide having at least about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 

15 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 
74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 
89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more; or complete 
(100%) sequence identity to an exemplary nucleic acid of the invention. In one aspect, 
the sequence identities are determined by analysis with a sequence comparison algorithm 

20 or by visual inspection. In alternative aspects, the probe can comprise an oligonucleotide 
comprising at least about 10 to 50, about 20 to 60, about 30 to 70, about 40 to 80, or about 
60 to 100 consecutive bases of a nucleic acid sequence of the invention, or a subsequence 
thereof. 

The invention provides an amplification primer pair for amplifying (e.g., by PGR) 
25 a nucleic acid encoding a polypeptide having a cellulase activity, e.g., endoglucanase, 

ceUobiohydrolase, mannanase and/or beta-glucosidase activity, wherein the primer pair is 
capable of amplifying a nucleic acid comprising a sequence of the invention, or fragments 
or subsequences thereof. One or each member of the amplification primer sequence pair 
can comprise an oligonucleotide comprising at least about 10 to 50, or more, consecutive 
30 bases of the sequence, or about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 
25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36 or more consecutive bases of the sequence. 
The invention provides amplification primer pairs, wherein the primer pair comprises a 
first member having a sequence as set forth by about the first (the 5') 12, 13, 14, 15, 16, 
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36 or more 
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residues of a nucleic acid of the invention, and a second member having a sequence as set 
forth by about the first (the 5') 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 
27, 28, 29, 30, 31, 32, 33, 34, 35, 36 or more residues of the complementary strand of the 
first member. 

The invention provides cellulase-encoding , e.g., endoglucanase-, 
cellobiohydrolase- and/or beta-glucosidase-encoding nucleic acids generated by 
amplification, e.g., polymemse chain reaction (PGR), using an amplification primer pair 
of the invention. The invention provides cellulase-encoding , e.g., endoglucanase-, 
cellobiohydrolase- and/or beta-glucosidase-encoding nucleic acids generated by 
amplification, e.g., polymerase clrnin reaction (PGR), using an amplification primer pair 
of the invention. The invention provides methods of making a cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme by 
amplification, e.g., polymerase chain reaction (PGR), using an amplification primer pair 
of the invention. In one aspect, the amplification primer pair amplifies a nucleic acid 
&om a library, e.g., a gene library, such as an environmental library. 

The invention provides methods of amplifying a nucleic acid encoding a 
polypeptide having a cellulase activity, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase activity comprising amplification of a template 
nucleic acid with an amplification primer sequence pair capable of amplifying a nucleic 
acid sequence of the invention, or fragments or subsequences thereof 

The invention provides expression cassettes comprising a nucleic acid of the 
invention or a subsequence thereof In one aspect, the expression cassette can comprise 
the nucleic acid that is operably linked to a promoter. The promoter can be a viral, 
bacterial, mammaUan or plant promoter. In one aspect, the plant promoter can be a 
potato, rice, com, wheat, tobacco or barley promoter. The promoter can be a constitutive 
promoter. The constitutive promoter can comprise CaMV35S. In another aspect, the 
promoter can be an inducible promoter. In one aspect, the promoter can be a tissue- 
specific promoter or an environmentally regulated or a developmentally regulated 
promoter. Thus, the promoter can be, e.g., a seed-specific, a leaf-specific, a root-specific, 
a stem-specific or an abscission-induced promoter. In one aspect, the expression cassette 
can further comprise a plant or plant virus expression vector. 

The invention provides cloning vehicles comprising an expression cassette (e.g., a 
vector) of the invention or a nucleic acid of the invention. The cloning vehicle can be a 
viral vector, a plasmid, a phage, a phagemid, a cosmid, a fosmid, a bacteriophage or an 



wo 2006/101584 



13 



PGT/US2006/002516 



564462014240/D2150-2WO 

artificial chromosome. The viral vector can comprise an adenovirus vector, a retroviral 
vector or an adeno-associated viral vector. The cloning vehicle can comprise a bacterial 
artificial chromosome (BAG), a plasmid, a bacteriophage Pl-derived vector (PAC), a 
yeast artificial chromosome (YAC), or a mammalian artificial chromosome (MAC). 
5 The invention provides transformed cell comprising a nucleic acid of the 

invention or an expression cassette (e.g., a vector) of the invention, or a cloning vehicle of 
the invention. In one aspect, the transformed cell can be a bacterial cell, a mammalian 
cell, a fungal cell, a yeast cell, an insect ceU or a plant cell. In one aspect, the plant cell 
can be soybeans, rapeseed, oilseed, tomato, cane sugar, a cereal, a potato, wheat, rice, 
10 com,.tobacco or barley cell. 

The invention provides transgenic non-human animals comprising a nucleic acid 
of the invention or an expression cassette (e.g., a vector) of the invention. In one aspect, 
the animal is a mouse, a rat, a pig, a goat or a sheep. 

The invention provides transgenic plants comprising a nucleic acid of the 
15 invention or an expression cassette (e.g., a vector) of the invention. The transgenic plant 
can be a cereal plant, a com plant, a potato plant, a tomato plant, a wheat plant, an oilseed 
plant, a rapeseed plant, a soybean plant, a rice plant, a barley plant or a tobacco plant. 

The ittvention provides transgenic seeds comprising a nucleic acid of the 
mvention or an expression cassette (e.g., a vector) of the invention. The transgenic seed 
20 can be a cereal plant, a com seied, a wheat kemel, an oilseed, a rapeseed, a soybean seed, 
a palm kemel, a simflower seed, a sesame seed, a peanut or a tobacco plant seed. 

The invention provides an antisense oligonucleotide comprising a nucleic acid 
sequence complementary to or capable of hybridizing xmder stringent conditions to a 
nucleic acid of the invention. The invention provides methods of inhibiting the 
25 translation of a ceUulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase enzyme message in a cell comprising administering to the cell or expressing 
in the cell an antisense oUgonucleotide comprising a nucleic acid sequence 
complementary to or capable of hybridizing under stringent conditions to a nucleic acid 
of the invention. In one aspect, the antisense ohgonucleotide is between about 10 to 50, 
30 about 20 to 60, about 30 to 70, about 40 to 80, or about 60 to 100 bases in length, e.g., 10, 
15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more bases m 
length. The invention provides methods of inhibiting the translation of a cellulase 
enzyme, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
enzyme message in a cell comprising administering to the cell or expressing in the cell an 
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antisense oligonucleotide comprising a nucleic acid sequence complementary to or 
capable of hybridizing under stringent conditions to a nucleic acid of the invention. 

The mvention provides double-stranded inhibitory RNA (RNAi, or RNA 
interference) molecules (including small interfering RNA, or siRNAs, for inhibiting 
transcription, and microRNAs, or miRNAs, for inhibiting translation) comprising a 
subsequence of a sequence of the invention. In one aspect, the siRNA is between about 
21 to 24 residues, or, about at least 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 
29, 30, 31, 32, 33, 34, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more 
duplex nucleotides in length. The invention provides methods of inhibiting the 
expression of a cellulase enzyme, e.g., endoglucanase, cellobiohydrolase, maimanase 
and/or beta-glucosidase enzyme in a cell comprising administering to the cell or 
expressing in the cell a double-stranded inhibitory RNA (siRNA or miRNA), wherein the 
RNA comprises a subsequence of a sequence of the invention. 

The invention provides isolated or recombinant polypeptides comprising an amino 
acid sequence having at least about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 
59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 
74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 
89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, or complete 
(100%) sequence identity to an exemplary polypeptide or peptide of the mvention over a 
region of at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 
95, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350 or more residues, or over the 
full length of the polypeptide. In one aspect, the sequence identities are determined by 
analysis with a sequence comparison algorithm or by a visual inspection. Exemplary 
polypeptide or peptide sequences of the mvention include SEQ ID N0:2, SEQ ID N0:4, 
SEQ ID NO:6, SEQ ID NO: 8, SEQ ID NO: 10, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID 
NO:16, SEQ ID NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID 
NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, SEQ ID NO:34, SEQ ID 
NO:36, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID 
NO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID 
NO:56, SEQ ID NO:58, SEQ ID NO:60, SEQ ID NO:62, SEQ ID NO:64, SEQ ID 
NO:66, SEQ ID NO:68, SEQ ID NO:70, SEQ ID NO:72, SEQ ID NO:74, SEQ ID 
NO:76, SEQ ID NO:78, SEQ ID NO:80, SEQ ID NO:82, SEQ ID NO:84, SEQ ID 
NO:86, SEQ ID NO:88, SEQ ID NO:90, SEQ ID NO:92, SEQ ID NO:94, SEQ ID 
NO:96, SEQ ID NO:98, SEQ ED NO:100, SEQ ID NO:102, SEQ ID NO:104, SEQ ID 
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NO: 106, SEQ ID NO: 108, SEQE)NO:110, SEQIDN0:112, SEQIDN0:114, SEQID 
NO:116, SEQ ID N0:118, SEQ ID NO:120, SEQ ID N0:122, SEQ IDNO:124, SEQ ID 
NO:126, SEQ ID NO:128, SEQ ID NO:130, SEQ ID NO:132, SEQ ID NO:134, SEQ ID 
NO:136, SEQ ID NO:138, SEQ ID NO:140, SEQ ID NO:142, SEQ ID NO:143, SEQ ID 

5 NO:146, SEQ ID NO:148, SEQ ID NO:150, SEQ ID NO:152, SEQ ID NO:154, SEQ ID 
NO: 156, SEQ ID NO: 158, SEQ ID NO: 160, SEQ ID NO: 162, SEQ ID NO: 164 and SEQ 
ID NO:166 (see also Tables 1, 2, and 3, Examples 1 and 4, below, and Sequence Listing), 
and subsequences thereof and variants thereof. Exemplary polypeptides also include 
fragments of at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 75, 80, 85, 90, 95, 100, 150, 

10 200, 250, 300, 350, 400, 450, 500, 550, 600 or more residues in length, or over the full 

length of an enzyme. Polypeptide or peptide sequences of the invention include sequence 
encoded by a nucleic acid of the invention. Polypeptide or peptide sequences of the 
mvention include polypeptides or peptides specifically bound by an antibody of the 
invention (e.g., epitopes), or polypeptides or peptides that can generate an antibody of the 

15 invention (e.g., an immunogen). 

In one aspect, a polypeptide of the invention has at least one cellulase enzyme, 
e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme 
activity. In alternative aspects, a polynucleotide of the invention encodes a polypeptide 
that has at least one cellulase enzyme, e.g., endoglucanase, cellobiohydrolase, mannanase 

20 and/or beta-glucosidase enzyme activity. 

In one aspect, the cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme activity is thermostable. The polypeptide can retain a 
ceUulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
enzyme activity under conditions comprising a temperature range of between about 1°C 

25 to about 5'*C, between about 5°C to about 15**C, between about 15^C to about 25''C, 

between about 25°C to about 37**C, between about 37°C to about 95°C, between about 
55°C to about 85°C, between about 70°C to about 75°C, or between about 90''C to about 
95^C, or more. In another aspect, the cellulase, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase enzyme activity can be thermotolerant. The 

30 polypeptide can retain a cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 

and/or beta-glucosidase enzyme activity after exposure to a temperature in the range from 
greater than 3TC to about 95°C, or in the range from greater than 55°C to about 85°C. In 
one aspect, the polypeptide can retain a cellulase, e.g., endoglucanase, cellobiohydrolase, 
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mannanase and/or beta-glucosidase enzyme activity after exposure to a temperature in the 
range from greater than 90T to about 95°C at pH 4.5. 

Another aspect of the mvention provides an isolated or recombinant polypeptide 
or peptide comprismg at least 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 
90, 95, 100, 125, 150 or more consecutive bases of a polypeptide or peptide sequence of 
the invention, sequences substantially identical thereto, and the sequences complementary 
thereto. The peptide can be, e.g., an immunogenic fragment, a motif (e.g., a binding site), 
a signal sequence, a prepro sequence or an active site. 

The invention provides isolated or recombinant nucleic acids comprising a 
sequence encoding a polypeptide having a cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme activity and a signal 
sequence, wherein the nucleic acid comprises a sequence of the invention. The signal 
sequence can be derived from another cellulase, e.g., endoglucanase, cellobiohydrolase, 
maxmanase and/or beta-glucosidase enzyme or a non-cellxilase, e.g., non-endoglucanase, 
non-cellobiohydrolase and/or non-beta-glucosidase enzyme (a heterologous) enzyme. 
The invention provides isolated or recombinant nucleic acids comprising a sequence 
encoding a polypeptide having a cellulase, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase enzyme activity, wherein the sequence does not 
contain a signal sequence and the nucleic acid comprises a sequence of the invention. In 
one aspect, the invention provides an isolated or recombinant polypeptide comprising a 
polypeptide of the invention lacking all or part of a signal sequence. In one aspect, the 
isolated or recombinant polypeptide can comprise the polypeptide of the invention 
comprising a heterologous signal sequence, such as a heterologous cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme signal 
sequence or non-cellulase, e.g., non-endoglucanase, non-cellobiohydrolase and/or non- 
beta-glucosidase enzyme signal sequence. 

In one aspect, the invention provides chimeric proteins comprising a first domain 
comprising a signal sequence of the invention and at least a second domain. The protein 
can be a fusion protein. The second domain can comprise an enzyme. The enzyme can 
be a non- enzyme. 

The invention provides chimeric polypeptides comprising at least a first domain 
comprising signal peptide (SP), a prepro sequence and/or a catalytic domain (CD) of the 
invention and at least a second domain comprising a heterologous polypeptide or peptide, 
wherein the heterologous polypeptide or peptide is not naturally associated with the signal 
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peptide (SP), prepro sequence and/ or catalytic domain (CD). In one aspect, the 
heterologous polypeptide or peptide is not a cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme. The heterologous 
polypeptide or peptide can be amino terminal to, carboxy terminal to or on both ends of 

5 the signal peptide (SP), prepro sequence and/or catalytic domain (CD). 

The invention provides isolated or recombinant nucleic acids encoding a chimeric 
polypeptide, wherein the chimeric polypeptide comprises at least a first domain 
comprising signal peptide (SP), a prepro domain and/or a catalytic domain (CD) of the 
invention and at least a second domain comprising a heterologoiis polypeptide or peptide, 

10 wherein the heterologous polypeptide or peptide is not naturally associated with the signal 
peptide (SP), prepro domain and/ or catalytic domain (CD). 

The invention provides isolated or recombinant signal sequences (e.g., signal 
peptides) consisting of or comprising a sequence as set forth in residues 1 to 14, 1 to 15, 1 
to 16, 1 to 17, 1 to 18, 1 to 19, 1 to 20, 1 to 21, 1 to 22, 1 to 23, 1 to 24, 1 to 25, 1 to 26, 1 

15 to 27. 1 to 28, 1 to 28, 1 to 30, 1 to 31, 1 to 32, 1 to 33, 1 to 34, 1 to 35, 1 to 36, 1 to 37, 1 
to 38, 1 to 40, 1 to 41, 1 to 42, 1 to 43, 1 to 44, 1 to 45, 1 to 46 or 1 to 47, of a polypeptide 
of the invention, e.g., the exemplary SEQ ID N0:2, SEQ ID NO:4, SEQ ID N0:6, SEQ 
ID N0:8, SEQ ID NO: 10, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 1 6, SEQ ID 
NO:18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID 

20 NO:28, SEQ ID NO:30, SEQ ID NO:32, SEQ ID NO:34, SEQ ID NO:36, SEQ ID 
NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID 
NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID 
NO:58, SEQ ID NO:60, SEQ ID NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ ID 
NO:68, SEQ ID NO:70, SEQ ID NO:72, SEQ ID NO:74, SEQ ID NO:76, SEQ ID 

25 NO:78, SEQ ID NO:80, SEQ ID NO:82, SEQ ID NO:84, SEQ ID NO:86, SEQ ID 
NO:88, SEQ ID NO:90, SEQ ID NO:92, SEQ ID NO:94, SEQ ID NO:96, SEQ ID 
NO:98, SEQ ID NO:100, SEQIDNO:102, SEQIDNO:104, SEQ ID NO:106, SEQ ID 
NO: 108, SEQ ID NO: 1 10, SEQ ID NO: 1 12, SEQ ID NO: 1 14, SEQ ID NO: 116, SEQ ID 
N0:1 18, SEQ ID NO:120, SEQ ID NO: 122, SEQ ID NO:124, SEQ ID NO:126, SEQ ID 

30 NO:128, SEQ ID NO:130, SEQ ID NO:132, SEQ ID NO:134, SEQ ID NO:136, SEQ ID 
NO:138, SEQ IDNO:140, SEQIDNO:142, SEQIDNO:143, SEQIDNO:146, SEQ ID 
NO: 148, SEQ ID NO: 150, SEQ ID NO: 152, SEQ ID NO: 154, SEQ ID NO: 156, SEQ ED 
NO:158, SEQ ID NO:160, SEQ ID NO:162, SEQ ID NO:164 or SEQ ID NO:166 (see 
Tables 1, 2, and 3, Examples 1 and 4, below, and Sequence Listing). In one aspect, the 



wo 2006/101584 



IS 



PCT/US2006/002516 



5644620 1 4240/D2 1 50-2 WO 

invention provides signal sequences comprising the first 14, 15, 16, 17, 18, 19, 20, 21, 22, 
23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45. 46, 
47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70 
or more amino terminal residues of a polypeptide of the invention. 

In one aspect, the cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme activity comprises a specific activity at about 37°C in the 
range from about 1 to about 1200 units per milUgram of protein, or, about 100 to about 
1000 units per milligram of protein. In another aspect, the cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme activity comprises a 
specific activity from about 100 to about 1000 units per milligram of protein, or, from 
about 500 to about 750 units per milligram of protein. Alternatively, the cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme activity 
comprises a specific activity at 37''C in flie range from about 1 to about 750 units per 
milligram of protem, or, from about 500 to about 1200 units per milligram of protem. In 
one aspect, the cellxilase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase enzyme activity comprises a specific activity at 37°C in the range from about 
1 to about 500 units per milligram of protein, or, from about 750 to about 1000 units per 
milligram of protein. In another aspect, the cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme activity comprises a 
specific activity at 37**C in the range from about 1 to about 250 units per milligram of 
protein. Alternatively, the cellulase, e.g., endoglucanase, cellobiohydrolase, maimanase 
and/or beta-glucosidase enzyme activity comprises a specific activity at 3TC in the range 
from about 1 to about 100 units per milligram of protein. 

In another aspect, the thennotolerance comprises retention of at least half of the 
specific activity of the cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme at 37°C after being heated to the elevated tempemture. 
Alternatively, the thermotolerance can comprise retention of specific activity at 3TC in 
the range from about 1 to about 1200 units per milligram of protein, or, from about 500 to 
about 1000 units per milUgram of protein, after being heated to the elevated temperature. 
In another aspect, the thermotolerance can comprise retention of specific activity at 37°C 
in the range from about 1 to about 500 units per milligram of protein after being heated to 
the elevated temperature. 

The invention provides the isolated or recombinant polypeptide of the mvention, 
wherein the polypeptide comprises at least one glycosylation site. In one aspect. 
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glycosylation can be anN-linlced glycosylation. In one aspect, the polypeptide can be 
glycosylated after being expressed in a P. pastoris or a S, pombe. 

In one aspect, the polypeptide can retain celiulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme activity under conditions 
comprising about pH 6.5, pH 6, pH 5.5, pH 5, pH 4.5 or pH 4 or more acidic. In another 
aspect, the polypeptide can retain a cellulase, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase enzyme activity xmder conditions comprising about 
pH 7, pH 7.5 pH 8.0, pH 8.5, pH 9, pH 9.5, pH 10, pH 10.5 or pH 1 1 or more basic pH. 
In one aspect, the polypeptide can retain a cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme activity after exposure to 
conditions comprising about pH 6.5, pH 6, pH 5.5, pH 5, pH 4.5 or pH 4 or more acidic 
pH. In another aspect, the polypeptide can retain a cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme activity after exposure to 
conditions comprising about pH 7, pH 7.5 pH 8.0, pH 8.5, pH 9, pH 9.5, pH 10, pH 10.5 
or pH 1 1 or more basic pH. 

In one aspect, the cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme of the invention has activity at under alkaline conditions, 
e.g., the alkaline conditions of the gut, e.g., the small intestine. In one aspect, the 
polypeptide can retains activity after exposure to the acidic pH of the stomach. 

The invention provides protein preparations comprising a polypeptide (including 
peptides) of the invention, wherein the protein preparation comprises a Uquid, a solid or a 
gel. The invention provides heterodimers comprising a polypeptide of the invention and a 
second protein or domain. The second member of the heterodimer can be a different 
cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
enzyme, a different enzyme or another protein. In one aspect, the second domain can be a 
polypeptide and the heterodimer can be a fusion protein. In one aspect, the second 
domain can be an epitope or a tag. In one aspect, the invention provides homodimers 
comprising a polypeptide of the invention. 

The invention provides immobilized polypeptides (including peptides) having 
cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
enzyme activity, wherein the immobilized polypeptide comprises a polypeptide of the 
invention, a polypeptide encoded by a nucleic acid of the invention, or a polypeptide 
comprising a polypeptide of the invention and a second domain. In one aspect, the 
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polypeptide can be immobilized on a cell, a metal, a resin, a polymer, a ceramic, a glass, a 
microelectrode, a graphitic particle, a bead, a gel, a plate, an array or a capillary tube. 

The invention also provides arrays comprising an immobilized nucleic acid of the 
invention, including, e.g., probes of the invention. The invention also provides arrays 
5 comprising an antibody of the invention. 

The invention provides isolated or recombinant antibodies that specifically bind to 
a polypeptide of the invention or to a polypeptide encoded by a nucleic acid of the 
invention. These antibodies of the invention can be a monoclonal or a polyclonal 
antibody. The invention provides hybridomas comprising an antibody of the invention, 

10 e.g., an antibody that specifically binds to a polypeptide of the invention or to a 

polypeptide encoded by a nucleic acid of the invention. The invention provides nucleic 
acids encoding these antibodies. 

The invention provides method of isolating or identifying a polypeptide having 
cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 

15 en2yme activity comprising the steps of: (a) providing an antibody of the invention; (b) 
providing a sample comprising polypeptides; and (c) contacting the sample of step (b) 
with the antibody of step (a) under conditions vi^herein the antibody can specifically bind 
to the polypeptide, thereby isolating or identifying a polypeptide having a cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme activity. 

20 The invention provides methods of making an anti-cellulase, e.g., anti- 

endoglucanase, anti-cellobiohydrolase and/or anti-beta-glucosidase enzyme antibody 
comprising administering to a non-human animal a nucleic acid of the invention or a 
polypeptide of the invention or subsequences thereof in an amount sufficient to generate a 
humoral immune response, thereby making an anti-cellulase, e.g., anti-endoglucanase, 

25 anti-cellobiohydrolase and/or anti-beta-glucosidase enzyme antibody. The invention 
provides methods of making an anti-cellulase, e.g., anti-endoglucanase, anti- 
cellobiohydrolase and/or anti-beta-glucosidase immune response (cellular or humoral) 
comprising administering to a non-human animal a nucleic acid of the invention or a 
polypeptide of the invention or subsequences thereof in an amount sufficient to generate 

30 an immune response (cellular or humoral). 

The invention provides methods of producing a recombinant polypeptide 
comprising the steps of: (a) providing a nucleic acid of the invention operably linked to a 
promoter; and (b) expressing the nucleic acid of step (a) under conditions that allow 
expression of the polypeptide, thereby producing a recombinant polypeptide. In one 
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aspect, the method can further comprise transforming a host cell with the nucleic acid of 
step (a) followed by expressing the nucleic acid of step (a), thereby producing a 
recombinant polypeptide in a transformed cell 

The invention provides methods for identifying a polypeptide having cellulase, 
e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme 
activity comprising the following steps: (a) providmg a polypeptide of the invention; or a 
polypeptide encoded by a nucleic acid of the invention; (b) providing cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase en2yme substrate; 
and (c) contacting the polypeptide or a fragment or variant thereof of step (a) with the 
substrate of step (b) and detecting a decrease in the amoimt of substrate or an increase in 
the amoimt of a reaction product, wherein a decrease in the amount of the substrate or an 
increase in the amoxmt of the reaction product detects a polypeptide having a cellulase, 
e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme 
activity. In one aspect, the substrate is a cellulose-comprising compound. 

The invention provides methods for identifying cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme substrate comprising the 
following steps: (a) providing a polypeptide of the invention; or a polypeptide encoded by 
a nucleic acid of the invention; (b) providing a test substrate; and (c) contacting the 
polypeptide of step (a) with the test substrate of step (b) and detecting a decrease in the 
amount of substrate or an increase in the amount of reaction product, wherein a decrease 
in the amount of the substrate or an increase in the amount of a reaction product identifies 
the test substrate as a cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or 
beta-glucosidase enzyme substrate. 

The invention provides methods of determining whether a test compound 
specifically binds to a polypeptide comprising the following steps: (a) expressing a 
nucleic acid or a vector comprising the nucleic acid irnder conditions permissive for 
translation of the nucleic acid to a polypeptide, wherein the nucleic acid comprises a 
nucleic acid of the invention, or, providing a polypeptide of the invention; (b) providing a 
test compound; (c) contacting the polypeptide with the test compound; and (d) 
determining whether the test compound of step (b) specifically binds to the polypeptide. 

The invention provides methods for identifying a modulator of a cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme activity 
comprising the following steps: (a) providing a polypeptide of the invention or a 
polypeptide encoded by a nucleic acid of the mvention; (b) providing a test compound; 



wo 2006/101584 



22 



PCT/US2006/002516 



564462014240/D2150-2WO 

(c) contacting the polypeptide of step (a) with the test compound of step (b) and 
measuring an activity of the cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
aud/or beta-glucosidase enzyme, wherein a change in the cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme activity measured in the 

5 presence of the test compound compared to the activity in the absence of the test 

compound provides a determination that the test compound modulates the cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase aud/or beta-glucosidase enzyme activity. 
In one aspect, the cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or 
beta-glucosidase enzyme activity can be measured by providing a cellulase, e.g., 

10 endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme substrate 
and detecting a decrease in the amoxmt of the substrate or an increase in the amount of a 
reaction product, or, an increase in the amount of the substrate or a decrease in the 
amount of a reaction product. A decrease in the amount of the substrate or an increase in 
the amovmt of the reaction product with the test compound as compared to the amount of 

15 substrate or reaction product without the test compound identifies tlie test compound as 
an activator of cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase enzyme activity. An increase in the amoxmt of the substrate or a decrease in 
the amoimt of the reaction product with the test compoimd as compared to the amoimt of 
substrate or reaction product without the test compound identifies the test compoimd as 

20 an inhibitor of cellxilase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase enzyme activity. 

The invention provides computer systems comprising a processor and a data 
storage device wherein said data storage device has stored thereon a polypeptide sequence 
or a nucleic acid sequence of the invention (e.g., a polypeptide or peptide encoded by a 

25 nucleic acid of the invention). In one aspect, the computer system can further comprise a 
sequence comparison algorithm and a data storage device having at least one reference 
sequence stored thereon. In another aspect, the sequence comparison algorithm 
comprises a computer program that indicates polymorphisms. In one aspect, the 
computer system can further comprise an identifier that identifies one or more features in 

30 said sequence. The invention provides computer readable media having stored thereon a 
polypeptide sequence or a nucleic acid sequence of the invention. The invention provides 
methods for identifying a feature in a sequence comprising the steps of: (a) reading the 
sequence using a computer program which identifies one or more features in a sequence, 
wherein the sequence comprises a polypeptide sequence or a nucleic acid sequence of the 
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invention; and (b) identifying one or more features in the sequence with tlie computer 

program. The invention provides methods for comparing a first sequence to a second 

sequence comprising the steps of: (a) reading the first sequence and the second sequence 

through use of a computer program which compares sequences, wherein the first 
5 sequence comprises a polypeptide sequence or a nucleic acid sequence of the invention; 

and (b) determining differences between the first sequence and the second sequence with 

the computer program. The step of determining differences between the first sequence 

and the second sequence can fiirther comprise the step of identifying polymorphisms. In 

one aspect, the method can further comprise an identifier that identifies one or more 
10 features in a sequence. In another aspect, the method can comprise reading the fixst 

sequence using a computer program and identifying one or more features in the sequence. 

The invention provides methods for isolating or recovering a nucleic acid ; 

encoding a polypeptide having a cellulase, e.g., endoglucanase, cellobiohydrolase, ; 

mannanase and/or beta-glucosidase enzyme activity fi:om an environmental sample i 
1 5 comprising the steps of: (a) providing an amplification primer sequence pair for | 

ampHfying a nucleic acid encoding a polypeptide having a cellulase, e.g., endoglucanase, ; 

cellobiohydrolase, mannanase and/or beta-glucosidase enzyme activity, wherein the j 

primer pair is capable of amplifying a nucleic acid of the invention; (b) isolating a nucleic ! 

acid from the environmental sample or treating the environmental sample such that l 
20 nucleic acid in the sample is accessible for hybridization to the amplification primer pair; ! 

and, (c) combkung the nucleic acid of step (b) with the amplification primer pair of step 

(a) and amplifying nucleic acid from the environmental sample, thereby isolatmg or 

recovering a nucleic acid encoding a polypeptide having a cellulase, e.g., endoglucanase, 

cellobiohydrolase, mannanase and/or beta-glucosidase enzyme activity firom an i 
25 environmental sample. One or each member of the amplification primer sequence pair 
can comprise an ohgonucleotide comprising an amplification primer sequence pair of the 
invention, e.g., having at least about 10 to 50 consecutive bases of a sequence of the 

J 

invention, I 
The invention provides methods for isolating or recovering a nucleic acid | 
30 encodmg a polypeptide having a cellulase, e.g., endoglucanase, cellobiohydrolase, 

mannanase and/or beta-glucosidase enzyme activity from an envhronmental sample I 
comprising the steps of: (a) providing a polynucleotide probe comprising a nucleic acid of i 
the invention or a subsequence thereof; (b) isolating a nucleic acid from the : 
environmental sample or treating the envnonmental sample such that nucleic acid in the \ 
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sample is accessible for hybridization to a polynucleotide probe of step (a); (c) combining 
the isolated nucleic acid or the treated environmental sample of step (b) with the 
polynucleotide probe of step (a); and (d) isolating a nucleic acid that specifically 
hybridizes with the polynucleotide probe of step (a), thereby isolating or recovering a 

5 nucleic acid encoding a polypeptide having a cellulase, e.g., endoglucanase, 

cellobiohydrolase, mannanase and/or beta-glucosidase enzyme activity from an 
environmental sample. The environmental sample can comprise a water sample, a liquid 
sample, a soil sample, an air sample or a biological sample. In one aspect, the biological 
sample can be derived from a bacterial cell, a protozoan cell, an insect cell, a yeast cell, a 

10 plant cell, a fungal cell or a mammaUan cell. 

The invention provides methods of generating a variant of a nucleic acid encoding 
a polypeptide having a cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme activity comprising the steps of: (a) providing a template 
nucleic acid comprising a nucleic acid of the invention; and (b) modifying, deleting or 

15 adding one or more nucleotides in the template sequence, or a combination thereof, to 
genemte a variant of the template nucleic acid. In one aspect, the method can further 
comprise expressing the variant nucleic acid to generate a variant cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme 
polypeptide. The modifications, additions or deletions can be introduced by a method 

20 comprising error-prone PGR, shuffling, oligonucleotide-directed mutagenesis, assembly 
PGR, sexual PGR mutagenesis, in vivo mutagenesis, cassette mutagenesis, recursive 
ensemble mutagenesis, exponential ensemble mutagenesis, site-specific mutagenesis, 
gene reassembly. Gene Site Saturation Mutagenesis (GSSM), synthetic ligation 
reassembly (SLR), Chromosomal Saturation Mutagenesis (GSM) or a combination 

25 thereof. In another aspect, the modifications, additions or deletions are introduced by a 
method comprising recombination, recursive sequence recombination, phosphothioate- 
modified DNA mutagenesis, uracil-containing template mutagenesis, gapped duplex 
mutagenesis, point mismatch repaur mutagenesis, repair-deficient host strain mutagenesis, 
chemical mutagenesis, radiogenic mutagenesis, deletion mutagenesis, restriction-selection 

30 mutagenesis, restriction-purification mutagenesis, artificial gene synthesis, ensemble 
mutagenesis, chimeric nucleic acid multimer creation and a combination thereof. 

In one aspect, the method can be iteratively repeated until a cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme having an 
altered or different activity or an altered or different stability from that of a polypeptide 
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encoded by the template nucleic acid is produced. In one aspect, the variant cellulase, 
e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme 
polypeptide is thennotolerant, and retains some activity after being exposed to an 
elevated temperature. In another aspect, the variant cellulase, e.g., endoglucanase, 

5 cellobiohydrolase, mannanase and/or beta-glucosidase en2yme polypeptide has increased 
glycosylation as compared to the cellulase, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase enzyme encoded by a template nucleic acid. 
Alternatively, the variant cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase polypeptide has a cellulase, e.g., endoglucanase, 

1 0 cellobiohydrolase, maimanase and/or beta-glucosidase enzyme activity under a high 

temperature, wherein the cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme encoded by the template nucleic acid is not active under 
the high temperature. In one aspect, the method can be iteratively repeated until a 
cellulase, e.g., endoglucanase, cellobiohydrolase, maimanase and/or beta-glucosidase 

1 5 enzyme coding sequence having an altered codon usage from that of the template nucleic 
acid is produced. In another aspect, the method can be iteratively repeated until a 
cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
enzyme gene having higher or lower level of message expression or stability from that of 
the template nucleic acid is produced. 

20 The invention provides methods for modifying codons in a nucleic acid encoding 

a polypeptide having a cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme activity to increase its expression in a host cell, the 
method comprising the following steps: (a) providing a nucleic acid of the invention 
encoding a polypeptide having a cellulase, e.g., endoglucanase, cellobiohydrolase, 

25 mannanase and/or beta-glucosidase enzyme activity; and, (b) identifying a non-preferred 
or a less preferred codon in the nucleic acid of step (a) and replachig it with a preferred or 
neutrally used codon encoding the same amino acid as the replaced codon, wherein a 
preferred codon is a codon over-represented in codmg sequences in genes in the host cell 
and a non-preferred or less preferred codon is a codon under-represented in coding 

30 sequences in genes in the host cell, thereby modifying the nucleic acid to increase its 
expression in a host cell. 

The invention provides methods for modifying codons in a nucleic acid encoding 
a polypeptide havmg a cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme activity; the method comprising the following steps: (a) 
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providing a nucleic acid of the invention; and, (b) identifying a codon in the nucleic acid 
of step (a) and replacing it with a different codon encoding the same amino acid as the 
replaced codon, thereby modifying codons in a nucleic acid encodmg a cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme. 

The invention provides methods for modifying codons in a nucleic acid encoding 
a polypeptide having a cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme activity to increase its expression in a host cell, the 
method comprising the following steps: (a) providing a nucleic acid of the invention 
encoding a cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase enzyme polypeptide; and, (b) identifying a non-preferred or a less preferred 
codon in the nucleic acid of step (a) and replacing it vidth a preferred or neutrally used 
codon encoding the same amino acid as the replaced codon, wherein a preferred codon is 
a codon over-represented in coding sequences in genes in the host cell and a non- 
preferred or less preferred codon is a codon under-represented in coding sequences in 
genes in the host cell, thereby modifying the nucleic acid to increase its expression in a 
host cell. 

The invention provides methods for modifying a codon in a nucleic acid encoding 
a polypeptide having a cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme activity to decrease its expression in a host cell, the 
method comprising the following steps: (a) providing a nucleic acid of the invention; and 
(b) identifying at least one preferred codon in the nucleic acid of step (a) and replacing it 
with a non-preferred or less preferred codon encoding the same amino acid as the 
replaced codon, wherein a preferred codon is a codon over-represented in coding 
sequences in genes in a host cell and a non-preferred or less prefen-ed codon is a codon 
under-represented in coding sequences in genes in the host cell, thereby modifying the 
nucleic acid to decrease its expression in a host cell. In one aspect, the host cell can be a 
bacterial cell, a fungal cell, an insect cell, a yeast cell, a plant cell or a mammalian cell. 

The invention provides methods for producing a library of nucleic acids encoding 
a plurality of modified cellulase, e.g., endoglucanase, cellobiohydrolase, maimanase 
and/or beta-glucosidase enzyme active sites or substrate binding sites, wherein the 
modified active sites or substrate binding sites are derived from a first nucleic acid 
comprising a sequence encoding a first active site or a first substrate bmding site the 
method comprising the following steps: (a) providing a first nucleic acid encoding a first 
active site or first substrate binding site, wherein the first nucleic acid sequence comprises 
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a sequence that hybridizes under stringent conditions to a nucleic acid of the invention, 
and the nucleic acid encodes a cellxilase, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase enzyme active site or a cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme substrate binding site; (b) 
5 providing a set of mutagenic oligonucleotides that encode naturally-occurring amino acid 
variants at a plurality of targeted codons in the first nucleic acid; and, (c) using the set of 
mutagenic oligonucleotides to generate a set of active site-encoding or substrate binding 
site-encoding variant nucleic acids encoding a range of amino acid variations at each 
amino acid codon that was mutagenized, thereby producing a library of nucleic acids 
1 0 encoding a plurality of modified cellulase, e.g., endoglucanase, cellobiohydrolase, 

mannanase and/or beta-glucosidase enzyme active sites or substrate binding sites. In one 
aspect, the method comprises mutagenizing the first nucleic acid of step (a) by a method 
comprising an optimized directed evolution system, Gene Site Saturation Mutagenesis 
(GSSM), synthetic ligation reassembly (SLR), error-prone PGR, shuffling, 
15 oligonucleotide-directed mutagenesis, assembly PGR, sexual PGR mutagenesis, in vivo 
mutagenesis, cassette mutagenesis, recursive ensemble mutagenesis, exponential 
ensemble mutagenesis, site-specific mutagenesis, gene reassembly, and a combination 
thereof. In another aspect, the method comprises mutagenizing the first nucleic acid of 
step (a) or variants by a method comprising recombination, recursive sequence 
20 recombination, phosphothioate-modified DNA mutagenesis, uracil-containing template 
mutagenesis, gapped duplex mutagenesis, point mismatch repair mutagenesis, repair- 
deficient host strain mutagenesis, chemical mutagenesis, radiogenic mutagenesis, deletion 
mutagenesis, restriction-selection mutagenesis, restriction-purification mutagenesis, 
artificial gene synthesis, ensemble mutagenesis, chimeric nucleic acid multimer creation 
25 and a combination thereof. 

The invention provides methods for making a small molecule comprising the 
following steps: (a) providing a plurality of biosynthetic enzymes capable of synthesizing 
or modifying a small molecule, wherein one of the enzymes comprises a cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme encoded 
30 by a nucleic acid of the invention; (b) providing a substrate for at least one of the 

enzymes of step (a); and (c) reactmg the substrate of step (b) with the enzymes under 
conditions that facilitate a plurality of biocatalytic reactions to generate a small molecule 
by a series of biocatalytic reactions. The invention provides methods for modifying a 
small molecule comprising the following steps: (a) providing a cellulase, e.g., 
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endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme, wherein 
the enzyme comprises a polypeptide of the invention, or, a polypeptide encoded by a 
nucleic acid of the invention, or a subsequence thereof; (b) providing a small molecule; 
and (c) reacting the enzyme of step (a) with the small molecule of step (b) imder 
5 conditions that faciUtate an enzymatic reaction catalyzed by the cellulase, e.g., 

endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme, thereby 
modifying a small molecule by a cellulase, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase enzymatic reaction. In one aspect, the method can 
comprise a plurality of small molecule substrates for the enzyme of step (a), thereby 

10 generating a library of modified small molecules produced by at least one enzymatic 
reaction catalyzed by the cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme. In one aspect, the method can comprise a pluraUty of 
additional enzymes under conditions that facilitate a plurality of biocatal3^c reactions by 
the enzymes to form a library of modified small molecules produced by the plurality of 

15 enzymatic reactions. In another aspect, the method can further comprise the step of 
testing the library to determine if a particular modified small molecule that exhibits a 
desired activity is present within the library. The step of testing the Ubrary can further 
comprise the steps of systematically eliminating all but one of the biocatalytic reactions 
used to produce a portion of the plurality of the modified small molecules within the 

20 library by testing the portion of the modified small molecule for the presence or absence 
of the particular modified small molecule with a desired activity, and identifying at least 
one specific biocatalytic reaction that produces the particular modified small molecule of 
desired activity. 

The invention provides methods for determining a functional fragment of a 
25 cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
enzyme comprising the steps of: (a) providing a cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme, wherein the enzyme 
comprises a polypeptide of the invention, or a polypeptide encoded by a nucleic acid of 
the invention, or a subsequence thereof; and (b) deleting a plurality of amino acid 
30 residues from the sequence of step (a) and testmg the remaining subsequence for a 

cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
enzyme activity, thereby detenrdning a functional fragment of a cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme. In one 
aspect, the ceUuIase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta- 
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glucosidase enzyme activity is measured by providing a cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase en2yme substrate and detecting a 
decrease in the amount of the substrate or an increase in the amoxmt of a reaction product. 
The invention provides methods for whole cell engineering of new or modified 

5 phenotypes by using real-time metabolic flux analysis, the method comprising the 

following steps: (a) making a modified cell by modifying the genetic composition of a 
cell, wherein the genetic composition is modified by addition to the cell of a nucleic acid 
of the invention; (b) culturing the modified cell to generate a plurality of modified cells; 
(c) measuring at least one metabolic parameter of the cell by monitoring the cell culture 

10 of step (b) in real time; and, (d) analyzing the data of step (c) to determine if the measured 
parameter differs firom a comparable measurement in an unmodified cell imder similar 
conditions, thereby identifying an engineered phenotype in the cell using real-time 
metabolic flux analysis. In one aspect, the genetic composition of the cell can be 
modified by a method comprising deletion of a sequence or modification of a sequence in 

15 the cell, or, Icnocking out the expression of a gene. In one aspect, the method can further 
comprise selecting a cell comprising a newly engineered phenotype. In another aspect, 
the method can comprise culturing the selected cell, thereby generating a new cell strain 
comprising a newly engineered phenotype. 

The invention provides methods of increasing thermotolerance or thermostability 

20 of a cellulase, e.g., endoglucahase, cellobiohydrolase, mannanase and/or beta-glucosidase 
enzyme polypeptide, the method comprising glycosylating a cellulase, e.g., 
endoglucanase, cellobiohydrolase, maimanase and/or beta-glucosidase enzyme 
polypeptide, wherein the polypeptide comprises at least thirty contiguous amino acids of 
a polypeptide of the invention; or a polypeptide encoded by a nucleic acid sequence of the 

25 iavention, thereby increasing the thermotolerance or thermostability of the cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase polypeptide. In 
one aspect, the cellulase, e.g., endoglucanase, cellobiohydrolase, maimanase and/or beta- 
glucosidase enzyme specific activity can be thermostable or thermotolerant at a 
temperature in the range from greater than about 3TC to about 95'*C. 

30 The invention provides methods for overexpressing a recombinant cellulase, e.g., 

endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase polypeptide in a 
cell comprising expressing a vector comprising a nucleic acid comprising a nucleic acid 
of the invention or a nucleic acid sequence of the invention, wherein the sequence 
identities are determined by analysis with a sequence comparison algorithm or by visual 
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inspection, wherein overexpression is effected by use of a high activity promoter, a 
dicistronic vector or by gene amplification of the vector. 

The invention provides methods of making a transgenic plant comprising the 
following steps: (a) introducing a heterologous nucleic acid sequence into the cell, 
wherein the heterologous nucleic sequence comprises a nucleic acid sequence of the 
invention, thereby producing a transformed plant cell; and (b) producing a transgenic 
plant from the transformed cell In one aspect, the step (a) can further comprise 
introducing the heterologous nucleic acid sequence by electroporation or microinjection 
of plant cell protoplasts. In another aspect, the step (a) can further comprise introducing 
the heterologous nucleic acid sequence directly to plant tissue by DNA particle 
bombardment. Alternatively, the step (a) can further comprise introduciag the 
heterologous nucleic acid sequence into the plant cell DNA using woiAgrobacterium 
ttwtefaciens host. In one aspect, the plant cell can be a cane sugar, beet, soybean, tomato, 
potato, com, rice, wheat, tobacco or barley cell. 

The invention provides methods of expressing a heterologous nucleic acid 
sequence ia a plant cell comprising the following steps: (a) transforming the plant cell 
with a heterologous nucleic acid sequence operably linked to a promoter, whereia the 
heterologous nucleic sequence comprises a nucleic acid of the invention; (b) growing the 
plant under conditions wherein the heterologous nucleic acids sequence is expressed in 
the plant cell. The tQvention provides methods of expressing a heterologous nucleic acid 
sequence in a plant cell comprising the following steps: (a) transforming the plant cell 
with a heterologous nucleic acid sequence operably linked to a promoter, wherein the 
heterologous nucleic sequence comprises a sequence of the invention; (b) growing the 
plant imder conditions wherein the heterologous nucleic acids sequence is expressed in 
the plant cell. 

The invention provides feeds or foods comprising a polypeptide of the invention, 
or a polypeptide encoded by a nucleic acid of the invention. In one aspect, the invention 
provides a food, feed, a liquid, e.g., a beverage (such as a fruit juice or a beer), a bread or 
a dough or a bread product, or a beverage precursor (e.g., a wort), comprising a 
polypeptide of the invention. The invention provides food or nutritional supplements for 
an animal comprising a polypeptide of the invention, e.g., a polypeptide encoded by the 
nucleic acid of the invention. 

In one aspect, the polypeptide in the food or nutritional supplement can be 
glycosylated. The invention provides edible enzyme delivery matrices comprising a 
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polypeptide of the invention, e.g., a polypeptide encoded by the nucleic acid of the 
invention. In one aspect, the delivery matrix comprises a pellet In one aspect, the 
polypeptide can be glycosylated. In one aspect, the cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme activity is thennotolerant. 
In another aspect, the cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or 
beta-glucosidase enzyme activity is thermostable. 

The invention provides a food, a feed or a nutritional supplement comprising a 
polypeptide of the mvention. The invention provides methods for utilizmg a cellulase, 
e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme as a 
nutritional supplement m an animal diet, the metiiod comprising: preparing a nutritional 
supplement containing a cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme comprising at least thirty contiguous amino acids of a 
polypeptide of the invention; and administering the nutritional supplement to an animal. 
The animal can be a human, a ruminant or a monogastric animal. The cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme can be 
prepared by expression of a polynucleotide encoding die cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme in an organism selected 
from the group consisting of a bacterium, a yeast, a plant, an msect, a fungus and an 
animal. The organism can be selected from the group consisting of an S, pombe, S. 
cerevisiae, Pichia pastoris, E. coH, Streptomyces sp., Bacillus sp. dxA Lactobacillus sp. 

The invention provides edible enzyme deUvery matrix comprising a thermostable 
recombinant cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase enzyme, e.g., a polypeptide of the mvention. The invention provides 
methods for dehvering a cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme supplement to an animal, the method comprising; 
preparing an edible enzyme delivery matrix in the form of pellets comprismg a granulate 
edible carrier and a thermostable recombinant cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme, wherein the pellets 
readily disperse the cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or 
beta-glucosidase enzyme contained therem into aqueous media, and admmistering the 
edible enzyme delivery matrix to the animal. The recombinant cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme can 
comprise a polypeptide of the invention. The cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme can be glycosylated to 
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provide thermostability at pelletizing conditions. The delivery matrix can be formed by 
palletizing a mixture comprising a grain germ and a cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme. The pelletizing 
conditions can include application of steam. The pelletizing conditions can comprise 
5 application of a temperature in excess of about 80°C for about 5 minutes and the enzyme 
retains a specific activity of at least 350 to about 900 units per milligram of enzyme. 

In one aspect, invention provides a pharmaceutical composition comprising a 
cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
enzyme of the invention, or a polypeptide encoded by a nucleic acid of the invention. In 

10 one aspect, the pharmaceutical composition acts as a digestive aid. 

In certain aspects, a cellulose-containing compound is contacted a polypeptide of 
the invention having a cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme activity at a pH in the range of between about pH 3.0 to 
9.0, 10.0, 1 1.0 or more. In other aspects, a cellulose-containing compound is contacted 

1 5 with the cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase enzyme at a temperature of about 55°C, 60°C, 65°C, VO^^C, 75°C, 80°C, 
85^C, 90°C, or more. 

The details of one or more aspects of the invention are set forth in the 
* 20 accompanying drawings and the description below. Other features, objects, and 

advantages of the invention will be apparent from the description and drawings, and from 
the claims. 

All publications, patents, patent applications, GenBank sequences and 
ATCC deposits, cited herein are hereby expressly incorporated by reference for all 
25 purposes. 

BRIEF DESCRIPTION OF DRAWINGS 
The following drawings are illustrative of aspects of the invention and are not 
meant to limit the scope of the invention as encompassed by the claims. 
Figure 1 is a block diagram of a computer system. 
30 Figure 2 is a flow diagram illustrating one aspect of a process for comparing a new 

nucleotide or protein sequence with a database of sequences in order to determine the 
homology levels between the new sequence and the sequences in the database. 
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Figure 3 is a flow diagram illustrating one aspect of a process in a computer for 
detennining whether two sequences are homologous. 

Figure 4 is a flow diagram illustrating one aspect of an identifier process 300 for 
detecting the presence of a feature in a sequence. 
5 Figure 5 is an illustration of the structure of cellobiose. 

Figures 6 and 7 illustrate the results of a TLC analysis of reaction products from 
cellohexaose, as discussed in detail in Example 1, below. 

Figure 8 illustrates in graph form data showing the release of cellobiose from 
PASC by the exemplary enzyme 22/22a (a CBH) of the invention, as discussed in detail 
1 0 in Example 2, below. 

Figure 9 illustrates in graph form data showing the release of cellobiose from 
AVICEL® MCC by the exemplary enzyme 22/22a (a CBH) of the invention, as discussed 
in detail in Example 2, below. 

Figure 10 illustrates in graphic form data showing a typical GIGAMATRIX™ 
15 breakout, where active clones expressing enzyme able to hydrolyze methylumbelliferyl 
cellobioside are identified, as discussed in detail in Example 4, below. 

Figure 11 illustrates in graph form data showing the activity of selected enzymes 
against phosphoric acid-swollen cellulose (PASC) by capillary electrophoresis (CE) 
analysis, as discussed in detail in Example 4, below. 
20 Figure 12 illustrates in graph form data from assays of an exemplary enzyme of 

the invention and subclone variants in AVICEL® Microcrystalline Cellulose (MCC), 
where the reaction products were analyzed by the BCA reducing sugar assay, as discussed 
in detail in Example 4, below. 

Figure 13 illustrates in graph form data from primary GSSM screening assays, as 
25 discussed in detail in Example 4, below. 

Figure 14 illustrates in graph form data from secondary GSSM screening assays, 
as discussed in detail in Example 4, below. 

Figure 15 illustrates in graph form data from mixed, or "blended", GSSM 
screening assays, as discussed in detail in Example 4, below. 

30 

Like reference symbols in the various drawings indicate like elements. 
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DETAILED DESCRIPTION 
The invention provides polypeptides with cellulase, e.g., endoglucanase, 
ceUobiohydrolase, mannanase and/or beta-glucosidase activity, polynucleotides encoding 
them, and methods of making and using these polynucleotides and polypeptides. The 

5 invention also provides cellulase enzymes, e.g., endoglucanase, cellobiohydrolase, 

mannanase and/or beta-glucosidase enzymes, polynucleotides encoding these enzymes, 
the use of such polynucleotides and polypeptides. 

In one aspect, the invention provides a cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase, with an increased catalytic rate, 

10 improving the process of substrate hydrolysis. This increased efficiency in catalytic rate 
leads to an increased efficiency in producing sugars that will subsequentiy be used by 
microorganisms for ethanol production. In one aspect, microorganisms generating 
enzjone of the invention are used with ethanol-producing microorganisms. Thus, the 
invention provides methods for ethanol production and making "clean fuels" based on 

15 ethanol, e.g., for transportation using bioethanol. 

In one aspect the invention provides compositions (e.g., enzyme preparations, 
feeds, drugs, dietary supplements) comprising the enzymes, polypeptides or 
polynucleotides of the invention. These compositions can be fonnxJated in a variety of 
forms, e.g., as liquids, gels, pills, tablets, sprays, powders, food, feed pellets or 

20 encapsulated forms, including nanoencapsulated forms. 

Assays for measuring cellulase activity, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase activity, e.g., for determining if a polypeptide has 
ceUulase activity, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase activity, are well known in the . art and are within the scope of the invention; 

25 see, e.g.. Baker WL, Panow A, Estimation of ceUulase activity using a glucose-oxidase- 
Cu(n) reducing assay for glucose, J Biochem Biophys Methods. 1991 Dec, 23(4):265-73; 
Sharrock KR, Cellulase assay methods: a review, J Biochem Biophys Methods. 1988 Oct, 
17(2):81-105; Carder JH, Detection and quantitation of cellulase by Congo red staining of 
substrates in a cup-plate diffusion assay, Anal Biochem. 1986 Feb 15, 153(l):75-9; 

30 Canevascini G., A cellulase assay coupled to cellobiose dehydrogenase, Anal Biochem. 
1985 Jun, 147(2):419-27; Huang JS, Tang J, Sensitive assay for cellulase and dextranase. 
Anal Biochem, 1976 Jun, 73(2):369-77. 

The pH of reaction conditions utilized by the invention is another variable 
parameter for which the invention provides. In certain aspects, the pH of the reaction is 
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conducted in the range of about 3.0 to about 9.0, In other aspects, the pH is about 4.5 or 
the pH is about 7.5 or the pH is about 9. Reaction conditions conducted under alkaline 
conditions also can be advantageous, e.g., in some industrial or pharmaceutical 
applications of enzymes of the invention. 

5 The invention provides cellulase, e.g., endoglucanase, cellobiohydrolase, 

mannanase and/or beta-glucosidase polypeptides of the invention in a variety of forms 
and formulations. In the methods of the invention, ceUulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase polypeptides of the invention are 
used in a variety of forms and formulations. For example, purified cellulase, e.g., 

10 endoglucanase, cellobiohydrolase, maimanase and/or beta-glucosidase polypeptides can 
be used in enzyme preparations deployed in bioethanol production or in pharmaceutical 
or dietary aid applications. Alternatively, the enzymes of the invention can be used 
directly in processes to produce bioethanol, make clean fuels, process biowastes, process 
foods, liquids or feeds, and the like. 

15 Alternatively, the cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 

and/or beta-glucosidase polypeptides of the invention can be expressed in a 
microorganism using procedures known in the art. In other aspects, the cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase polypeptides of the 
invention can be immobilized on a solid support prior to use in the methods of the 

20 invention. Methods for immobilizing enzymes on solid supports are commonly known in 
the art, for example J. Mol. Cat. B: Enzymatic 6 (1999) 29-39; Chivata et al Biocatalysis: 
Immobilized cells and enzymes, J Mol. Cat. 37 (1986) 1-24: Sharma et al., Imumobilized 
Biomaterials Techniques and Applications, Angew. Chem. Int. Ed. Engl. 21 (1982) 837- 
54: Laskin (Ed.), Enzymes and Immobilized Cells in Biotechnology. 

25 Nucleic Acids, Probes and Inhibitory Molecules 

The invention provides isolated and recombinant nucleic acids, e.g., see Tables 1, 
2, and 3, Examples 1 and 4, below, and Sequence Listing; nucleic acids encoding 
polypeptides, including the exemplary polynucleotide sequences of the invention, e.g., 
see Table 1 and Sequence Listing; including expression cassettes such as expression 

30 vectors and various cloning vehicles comprising nucleic acids of the invention. The 

invention also includes methods for discovering, identifying or isolated new cellulases, 
e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase polypeptide 
sequences using the nucleic acids of the invention. The invention also includes methods 
for inhibiting the expression of cellulase, e.g., endoglucanase, cellobiohydrolase, 
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mannanase and/or beta-glucosidase encoding genes and transcripts using the nucleic acids 
of the invention. 

Also provided are methods for modifying the nucleic acids of the invention, 
including making variants of nucleic acids of the invention, by, e.g., synthetic ligation 

5 reassembly, optimized directed evolution system and/or saturation mutagenesis such as 
gene site saturation mutagenesis (GSSM). The term "saturation mutagenesis", Gene Site 
Saturation Mutagenesis, or "GSSM" includes a method that uses degenerate 
oligonucleotide primers to introduce point mutations into a poljmucleotide, as described 
in detail, below. The term "optinaized directed evolution system" or "optimized directed 

10 evolution" includes a method for reassembling fragments of related nucleic acid 
sequences, e.g., related genes, and explained in detail, below. The term "synthetic 
ligation reassembly" or "SLR" includes a method of ligating oligonucleotide fragments in 
a non-stochastic fashion, and explained in detail, below. The tenn "variant" refers to 
polynucleotides or polypeptides of the invention modified at one or more base pairs, 

15 codons, introns, exons, or amino acid residues (respectively) yet still retain the biological 
activity of a cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase of the invention. Variants can be produced by any number of means included 
methods such as, for example, error-prone PGR, shuffling, oUgonucleotide-directed 
mutagenesis, assembly PGR, sexual PGR mutagenesis, in vivo mutagenesis, cassette 

20 mutagenesis, recursive ensemble mutagenesis, exponential ensemble mutagenesis, site- 
specific mutagenesis, gene reassembly, GSSM and any combination thereof. 

The nucleic acids of the invention can be made, isolated and/or manipulated by, 
e.g., cloning and expression of cDNA libraries, amplification of message or genomic 
DNA by PGR, and the like. For example, exemplary sequences of the invention were 

25 initially derived from environmental sources. Thus, in one aspect, the invention provides 
cellulase, e.g., endoglucanase, cellobiohydrolase, mamanase and/or beta-glucosidase 
enzyme-encoding nucleic acids, and the polypeptides encoded by them, having a common 
novelty in that they are derived from a common source, e.g., an environmental, mixed 
culture, or a bacterial source. 

30 In practicing the methods of the invention, homologous genes can be modified by 

manipulating a template nucleic acid, as described herein. The invention can be practiced 
in conjunction with any method or protocol or device known in the art, which are well 
described in the scientific and patent literature. 
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The phrases "nucleic acid" or "nucleic acid sequence" as used herein refer to an 
ohgonucleotide, nucleotide, polynucleotide, or to a fragment of any of these, to DNA or 
RNA of genomic or synthetic origin which may be single-stranded or double-stranded 
and may represent a sense or antisense (complementary) strand, to peptide nucleic acid 

5 (PNA), or to any DNA-like or RNA-like material, natural or synthetic in origin. The 
phrases "nucleic acid" or "nucleic acid sequence" includes oligonucleotide, nucleotide, 
polynucleotide, or to a fragment of any of ttiese, to DNA or RNA (e.g., mRNA, rRNA, 
tRNA, iRNA) of genomic or synthetic origin which may be single-stranded or double- 
stranded and may represent a sense or antisense strand, to peptide nucleic acid (PNA), or 

10 to any DNA-like or RNA-like material, natural or synthetic in origin, including, e.g., 
iRNA, ribonucleoproteins (e.g., e.g., double stranded iRNAs, e.g., iRNPs). The term 
encompasses nucleic acids, i.e., oligonucleotides, containing known analogues of natural 
nucleotides. The term also encompasses nucleic-acid-like structures with synthetic 
backbones, see e.g., Mata (1997) Toxicol. Appl. Pharmacol. 144:189-197; Strauss- 

15 Soukup (1997) Biochemistry 36:8692-8698; Samstag (1996) Antisense Nucleic Acid 
Drug Dev 6: 153-1 56. "Oligonucleotide" includes either a single stranded 
polydeoxynucleotide or two complementary polydeoxynucleotide strands which may be 
chemically synthesized. Such synthetic oligonucleotides have no 5' phosphate and thus 
will not ligate to another oligonucleotide without adding a phosphate with an ATP in the 

20 presence of a kinase. A synthetic oligonucleotide can ligate to a fragment that has not 
been dephosphoiylated. 

A "coding sequence of or a "nucleotide sequence encoding" a particular 
polypeptide or protein, is a nucleic acid sequence which is transcribed and translated into 
a polypeptide or protein when placed under the control of appropriate regulatory 

25 sequences. The term "gene" means the segment of DNA involved in producing a 

polypeptide chain; it includes regions preceding and following the coding region (leader 
and trailer) as well as, where appUcable, intervening sequences (introns) between 
individual coding segments (exons). A promoter sequence is "operably linked to" a 
coding sequence when RNA polymerase which initiates transcription at the promoter will 

30 transcribe the coding sequence into mRNA. "Operably linked" as used herein refers to a 
functional relationship between two or more nucleic acid (e.g., DNA) segments. It can 
refer to the functional relationship of transcriptional regulatory sequence to a transcribed 
sequence. For example, a promoter is operably linlced to a coding sequence, such as a 
nucleic acid of the invention, if it stimulates or modulates the ti-anscription of the coding 
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sequence in an appropriate host cell or other expression system. Generally, promoter 
transcriptional regulatory sequences that are operably linked to a transcribed sequence are 
physically contiguous to the transcribed sequence, i.e., they are cis-acting. However, 
some transcriptional regulatory sequences, such as enhancers, need not be physically 
5 contiguous or located in close proximity to the coding sequences whose transcription they 
enhance. 

The term "expression cassette" as used herein refers to a nucleotide sequence 
which is capable of affecting expression of a structural gene (i.e., a protein coding 
sequence, such as a cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or 

10 beta-glucosidase enzyme of the invention) in a host compatible with such sequences. 
Expression cassettes include at least a promoter operably linked with the polypeptide 
coding sequence; and, optionally, with other sequences, e.g., transcription termination 
signals. Additional factors necessary or helpful in effecting expression may also be used, 
e.g., enhancers, alpha-factors. Thus, expression cassettes also include plasmids, 

15 expression vectors, recombinant viruses, any form of recombinant "naked DNA" vector, 
and the like. A "vector" comprises a nucleic acid which can infect, transfect, transiently 
or permanently transduce a cell. It will be recognized that a vector can be a naked nucleic 
acid, or a nucleic acid complexed with protein or lipid. The vector optionally comprises 
viral or bacterial nucleic acids and/or proteins, and/or membranes (e.g., a cell membrane, 

20 a viral Upid envelope, etc.). Vectors include, but are not limited to replicons (e.g., RNA 
replicons, bacteriophages) to which fragments of DNA may be attached and become 
replicated. Vectors thus include, but are not limited to RNA, autonomous self-replicating 
circular or linear DNA or RNA (e.g., plasmids, viruses, and the like, see, e.g., U.S. Patent 
No, 5,217,879), and include both the expression and non-expression plasmids. WTiere a 

25 recombinant microorganism or cell culture is described as hosting an "expression vector" 
this includes both extra-chromosomal circular and linear DNA and DNA that has been 
incorporated into the host chromosome(s). Where a vector is being maintained by a host 
cell, the vector may either be stably replicated by the cells during mitosis as an 
autonomous structure, or is incorporated within the host's genome. 

30 As used herein, the term "recombinant encompasses nucleic acids adjacent to a 

"backbone" nucleic acid to which it is not adjacent in its natural environment. In one 
aspect, to be "enriched" the nucleic acids will represent about 5% or more of the number 
of nucleic acid inserts in a population of nucleic acid backbone molecules. Backbone 
molecules according to the invention include nucleic acids such as expression vectors, 
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self-replicating nucleic acids, viruses, integrating nucleic acids and other vectors or 
nucleic acids used to maintain or manipulate a nucleic acid insert of interest. In one 
aspect, the enriched nucleic acids represent about 15% or more of the number of nucleic 
acid inserts in the population of recombinant backbone molecules. In one aspect, the 
enriched nucleic acids represent about 50% or more of the number of nucleic acid inserts 
in the population of recombinant backbone molecules. In a one aspect, the enriched 
nucleic acids represent about 90% or more of the number of nucleic acid inserts in the 
population of recombinant backbone molecules. 

One aspect of the invention is an isolated or recombinant nucleic acid comprising 
one of the sequences of the invention, or a fragment comprising at least 10, 15, 20, 25, 30, 
35, 40, 50, 75, 100, 150, 200, 300, 400, or 500 or more consecutive bases of a nucleic 
acid of the invention. The isolated or recombinant nucleic acids may comprise DNA, 
including cDNA, genomic DNA and synthetic DNA. The DNA may be double-stranded 
or single-stranded and if single stranded may be the coding strand or non-coding (anti- 
sense) strand. Alternatively, the isolated or recombinant nucleic acids comprise RNA. 

The isolated or recombinant nucleic acids of the invention may be used to prepare 
one of the polypeptides of the invention, or fragments comprising at least 5, 10, 15, 20, 
25, 30, 35, 40, 50, 75, 100, or 150 or more consecutive amino acids of one of the 
polypeptides of the invention. Accordingly, another aspect of the invention is an isolated 
or recombinant nucleic acid which encodes one of the polypeptides of the invention, or 
fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 or more 
consecutive amino acids of one of the polypeptides of the invention. The coding 
sequences of these nucleic acids may be identical to one of the coding sequences of one 
of the nucleic acids of the invention or may be different coding sequences which encode 
one of the of the invention having at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 
or more consecutive amino acids of one of the polypeptides of the invention, as a result of 
the redundancy or degeneracy of the genetic code. The genetic code is well known to 
those of skill in the art and can be obtained, e.g., on page 214 of B. Lewin, Genes VI, 
Oxford University Press, 1997. 

The nucleic acids encoding polypeptides of the invention include but are not 
limited to: the coding sequence of a nucleic acid of the invention and additional coding 
sequences, such as leader sequences or proprotein sequences and non-coding sequences, 
such as introns or non-coding sequences 5' and/or 3' of the coding sequence. Thus, as 
used herein, the term "polynucleotide encoding a polypeptide" encompasses a 
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polynucleotide which includes the coding sequence for the polypeptide as well as a 
polynucleotide which includes additional coding and/or non-coding sequence. 

In one aspect, the nucleic acid sequences of the invention are mutagenized using 
conventional techniques, such as site directed mutagenesis, or other techniques familiar to 

5 those skilled in the art, to introduce silent changes into the polynucleotides o of the 

invention. As used herein, "silent changes" mclude, for example, changes which do not 
alter the amino acid sequence encoded by the polynucleotide. Such changes may be 
desirable in order to increase the level of the polypeptide produced by host cells 
containing a vector encoding the polypeptide by introducing codons or codon pairs which 

10 occur jft-equently in the host organism. 

The invention also relates to polynucleotides which have nucleotide changes 
which result in amino acid substitutions, additions, deletions, fusions and truncations in 
the polypeptides of the invention. Such nucleotide changes may be introduced using 
techniques such as site directed mutagenesis, random chemical mutagenesis, exonuclease 

15 in deletion and other recombinant DNA techniques. Alternatively, such nucleotide 
changes may be naturally occurring allelic variants which are isolated by identifying 
nucleic acids which specifically hybridize to probes comprising at least 10, 15, 20, 25, 30, 
35, 40, 50, 75, 100, 150, 200, 300, 400, or 500 consecutive bases of one of the sequences of 
the invention (or the sequences complementary thereto) under conditions of high, 

20 moderate, or low stringency as provided herem. ' 

General Techniques 

The nucleic acids used to practice this invention, whether RNA, siRNA, miRNA, 
antisense nucleic acid, cDNA, genomic DNA, vectors, vhuses or hybrids thereof, may be 
isolated from a variety of sources, genetically engineered, amplified, and/or expressed/ 

25 generated recombinantly. Recombinant polypeptides (e.g., cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzymes) generated from these 
nucleic acids can be individually isolated or cloned and tested for a desired activity. Any 
recombinant expression system can be used, including bacterial, mammalian, yeast, insect 
or plant cell expression systems. 

30 Alternatively, these nucleic acids can be synthesized in vitro by well-known 

chemical synthesis techniques, as described in, e.g., Adams (1983) J. Am. Ghem. Soc. 
105:661; Belousov (1997) Nucleic Acids Res. 25:3440-3444; Frenkel (1995) Free Radic. 
Biol. Med. 19:373-380; Blommers (1994) Biochemistry 33:7886-7896; Narang (1979) 
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Meth. Enzymol. 68:90; Brown (1979) Meth. Enzymol. 68:109; Beaucage (1981) Tetra. 
Lett. 22:1859; U.S. Patent No. 4,458,066. 

Techniques for the manipulation of nucleic acids, such as, e.g., subcloning, 
labeling probes (e.g., random-pritner labeling using Klenow polymerase, nick translation, 
amplification), sequencing, hybridization and the like are well described in the scientific 
and patent literature, see, e.g., Sambrook, ed., MOLECULAR CLONING: A 
LABORATORY MANUAL (2ND ED.), Vols. 1-3, Cold Spring Harbor Laboratory, 
(1989); CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, Ausubel, ed. John 
Wiley & Sons, Inc., New York (1997); LABORATORY TECHNIQUES IK 
BIOCHEMISTRY AND MOLECULAR BIOLOGY: HYBRIDIZATION WITH 
NUCLEIC ACID PROBES, Part I. Theory and Nucleic Acid Preparation, Tijssen, ed. 
Elsevier, N.Y. (1993). 

Another useful means of obtaining and manipulating nucleic acids used to practice 
the methods of the invention is to clone from genomic samples, and, if desired, screen and 
re-clone inserts isolated or amplified from, e.g., genomic clones or cDNA clones. 
Sources of nucleic acid used in the methods of the invention include genomic or cDNA 
libraries contained in, e.g., mammalian artificial chromosomes (MACs), see, e.g., U.S. 
Patent Nos. 5,721,118; 6,025,155; human artificial chromosomes, see, e.g., Rosenfeld 
(1997) Nat. Genet. 15:333-335; yeast artificial chromosomes (YAC); bacterial artificial 
chromosomes (BAC); PI artificial chromosomes, see, e.g:, Woon (1998) Genomics 
50:306-316; Pl-derived vectors (PACs), see, e.g., Kern (1997) Biotechniques 23:120- 
124; cosmids, recombinant viruses, phages or plasmids. 

In one aspect, a nucleic acid encoding a polypeptide of the invention is assembled 
in appropriate phase with a leader sequence capable of directing secretion of the 
translated polypeptide or fi^gment thereof. 

The invention provides fiision proteins and nucleic acids encoding them. A 
polypeptide of the invention can be fiised to a heterologous peptide or polypeptide, such 
as N-tenninal identification peptides which impart desired characteristics, such as 
increased stability or simplified purification. Peptides and polypeptides of the invention 
can also be synthesized and expressed as fusion proteins with one or more additional 
domains Unlced thereto for, e.g., producing a more immunogenic peptide, to more readily 
isolate a recombinantly synthesized peptide, to identify and isolate antibodies and 
antibody-expressing B ceUs, and the like. Detection and purification facilitating domains 
include, e.g., metal chelating peptides such as polyhistidine tracts and histidine- 
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tryptophan modules that allow purification on immobilized metals, protein A domains 
that allow purification on immobilized immunoglobulin, and the domain utilized in the 
FLAGS extension/affinity purification system (Immunex Corp, Seattle WA). The 
inclusion of a cleavable linker sequences such as Factor Xa or enterokinase (Invitrogen, 

5 San Diego CA) between a purification domain and the motif-comprising peptide or 

polypeptide to facilitate purification. For example, an expression vector can include an 
epitope-encoding nucleic acid sequence linked to six histidine residues followed by a 
thioredoxin and an enterokinase cleavage site (see e.g., Williams (1995) Biochemistry 
34:1787-1797; Dobeli (1998) Protein Expr, Purif. 12:404-414). The histidine residues 

10 facilitate detection and purification while the enterokinase cleavage site provides a means 
for purifying the epitope from the remainder of the fijsion protein. Technology pertaining 
' to vectors encoding fusion proteins and application of fusion proteins are well described 
in the scientific and patent literature, see e.g., KroU (1993) DNA Cell. Biol, 12:441-53. 

Transcriptional and ti'anslational control sequences 

15 The invention provides nucleic acid (e.g., DNA) sequences of the invention 

operatively linked to expression (e.g., transcriptional or translational) control sequence(s), 
e.g., promoters or enhancers, to direct or modulate RNA synthesis/ expression. The 
expression control sequence can be in an expression vector. Exemplary bacterial 
promoters include lad, lacZ, T3, T7, gpt, lambda PR, PL and trp. Exemplary eukaryotio 

20 promoters include CMV immediate early, HSV thymidine kinase, early and late SV40, 
LTRs from retrovirus, and mouse metallothionein I. 

As used herein, the term "promoter" includes all sequences capable of driving 
transcription of a coding sequence in a cell, e.g., a plant or animal cell. Thus, promoters 
used in the constructs of the invention include ci>acting transcriptional control elements 

25 and regulatory sequences that are involved in regulating or modulating tiie timing and/or 
rate of transcription of a gene. For example, a promoter can be a cz^-acting 
transcriptional control element, including an enhancer, a promoter, a transcription 
terrninator, an origin of replication, a chromosomal integration sequence, 5' and 3' 
untranslated regions, or an intronic sequence, which are involved in transcriptional 

30 regulation. These cis-acting sequences can interact with proteins or other biomolecules to 
carry out (turn on/off, regulate, modulate, etc.) transcription. "Constitutive" promoters 
are those that drive expression continuously under most environmental conditions and 
states of development or cell differentiation. "Inducible" or "regulatable" promoters 
direct expression of the nucleic acid of the invention under the influence of environmental 
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conditions or developmental conditions. Examples of environmental conditions that may 
affect transcription by inducible promoters include anaerobic conditions, elevated 
temperature, drought, or the presence of light. 

"Tissue-specific" promoters are transcriptional control elements that are only 

5 active m particular cells or tissues or organs, e.g., in plants or animals. Tissue-specific 
regulation may be achieved by certain intrinsic factors which ensure that genes encoding 
proteins specific to a given tissue are expressed. Such factors are Icnown to exist in 
mammals and plants so as to allow for specific tissues to develop. 

Promoters suitable for expressing a polypeptide in bacteria include the E. coli lac 

10 or trp promoters, the lad promoter, the lacZ promoter, tiie T3 promoter, the T7 promoter, 
the gpt promoter, the lambda PR promoter, the lambda PL promoter, promoters fi:om 
operons encoding glycolytic enzymes such as 3-phosphoglycerate kinase (PGK), and the 
acid phosphatase promoter. Eukaiyotic promoters include the CMV immediate early 
promoter, the HSV thymidine kmase promoter, heat shock promoters, the early and late 

15 SV40 promoter, LTRs firom retroviruses, and the mouse metallothionein-I promoter. 

Other promoters known to control expression of genes in prokaryotic or eukaryotic cells 
or their viruses may also be used. Promoters suitable for expressing the polypeptide or 
fragment thereof in bacteria include the E. coli lac or trp promoters, the /ac/ promoter, the 
/flcZ promoter, the T3 promoter, the 77 promoter, the gpt promoter, the lambda Pr 

20 promoter, the lambda Pl promoter, promoters fi"om operons encoding glycolytic enzymes 
such as 3-phosphoglycerate kinase (PGK) and the acid phosphatase promoter. Fungal 
promoters include the a-factor promoter. Eukaryotic promoters include the CMV 
immediate early promoter, the HSV thymidine kinase promoter, heat shock promoters, 
the early and late SV40 promoter, LTRs from retroviruses and the mouse 

25 metallothionein-I promoter. Other promoters known to control expression of genes in 
prokaryotic or eukaryotic cells or their viruses may also be used. 

Tissue-Specific Plant Promoters 

The invention provides expression cassettes that can be expressed in a tissue- 
specific manner, e.g., that can express a cellulase, e.g., endoglucanase, cellobiohydrolase, 
30 mannanase and/or beta-glucosidase enzyme of the invention in a tissue-specific manner. 
The invention also provides plants or seeds that express a cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme of the invention in a 
tissue-specific manner. The tissue-specificity can be seed specific, stem specific, leaf 
specific, root specific, fiiiit specific and the like. 
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The tenn "plant" includes whole plants, plant parts (e.g., leaves, stems, flowers, 
roots, etc.), plant protoplasts, seeds and plant cells and progeny of same. The class of 
plants which can be used in Oae method of the invention is generally as broad as the class 
of higher plants amenable to transformation techniques, including angiospenns 

5 (monocotyledonous and dicotyledonous plants), as well as gymnosperms. It includes 

plants of a variety of ploidy levels, including polyploid, diploid, haploid and hemizygous 
states. As used herein, the term "transgenic plant" includes plants or plant cells into 
which a heterologous nucleic acid sequence has been inserted, e.g., the nucleic acids and 
various recombinant constructs (e.g., expression cassettes) of the invention. 

10 In one aspect, a constitutive promoter such as the CaMV 35S promoter can be 

used for expression m specific parts of the plant or seed or throughout the plant. For 
example, for overexpression, a plant promoter fragment can be employed which will 
direct expression of a nucleic acid in some or all tissues of a plant, e.g., a regenerated 
plant. Such promoters are referred to herein as "constitutive" promoters and are active 

15 imder most environmental conditions and states of development or cell differentiation. 
Examples of constitutive promoters include the cauliflower mosaic virus (CaMV) 35S 
transcription initiation region, the 1- or T- promoter derived from T-DNA of 
Agrobacterium tumefaciens^ and other transcription initiation regions from various plant 
genes known to those of skill. Such genes mclude, e.g., ACTIl from Arabidopsis (Huang 

20 {1996) Plant Mol Biol 33:125-139); Cat3 from Arabidopsis (GenBankNo. U43147, 
Zhong (1996) Mol. Gen, Genet 251:196-203); the gene encoding stearoyl-acyl carrier 
protein desaturase from Brassica napiis (Genbank No. X74782, Solocombe (1994) Plant 
Physiol 104:1167-1176); GPcl from maize (GenBankNo. X15596; Martinez (1989) /. 
Mol Biol 208:551-565); the Gpc2 from maize (GenBank No. U45855, Manjunath (1997) 

25 Plant Mol Biol 33:97-1 12); plant promoters described in U.S. Patent Nos. 4,962,028; 
5,633,440. 

The invention uses tissue-specific or constitutive promoters derived from viruses 
which can include, e.g., the tobamovirus subgenomic promoter (Kumagai (1995) Proc. 
Natl. Acad. Sci. USA 92:1679-1683; the rice tungro bacilliform virus (RTBV), which 
30 replicates only in phloem cells in infected rice plants, with its promoter which drives 

strong phloem-specific reporter gene expression; the cassava vein mosaic virus (CVMV) 
promoter, with highest activity in vascxdar elements, in leaf mesophyll cells, and in root 
tips (Verdaguer (1996) Plant Mol. Biol. 31:1129-1139). 



wo 2006/101584 



45 



PCT/US2006/002516 



564462014240/D2150-2WO 

In one aspect, the plant promoter directs expression of cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme- 
expressing nucleic acid in a specific tissue, organ or cell type (i.e. tissue-specific 
promoters) or may be otherwise under more precise environmental or developmental 
control or under the control of an inducible promoter. Examples of environmental 
conditions that may affect transcription include anaerobic conditions, elevated 
temperature, the presence of light, or sprayed with chemicals/hormones. For example, the 
invention incorporates the drought-inducible promoter of maize (Busk (1997) supra); the 
cold, drought, and high salt inducible promoter from potato (Kirch (1997) Plant Mol. 
Biol 33:897 909). 

In one aspect, tissue-specific promoters promote transcription only within a 
certain time frame of developmental stage within that tissue. See, e.g., Blazquez (1998) 
Plant Cell 10:791-800, characterizing the Arabidopsis LEAFY gene promoter. See also 
Cardon (1997) Plant J 12:367-77, describing the transcription fector SPL3, which 
recognizes a conserved sequence motif in the promoter region of the A. thaliana floral 
meristem identity gene API; and Mandel (1995) Plant Molecular Biology, Vol. 29, pp 
995-1004, describing the meristem promoter eIF4. Tissue specific promoters which are 
active throughout the life cycle of a particular tissue can be used. In one aspect, the 
nucleic acids of the invention are operably linked to a promoter active primarily only in 
cotton fiber cells. In one aspect, the nucleic acids of the invention are operably linked to 
a promoter active primarily during the stages of cotton fiber cell elongation, e.g., as 
described by Rinehart (1996) supra. The nucleic acids can be operably linked to the 
Fbl2A gene promoter to be preferentially expressed in cotton fiber cells (Ibid) . See also, 
John (1997) Proc. Natl. Acad. Sci, USA 89:5769-5773; John, et al., U.S. Patent Nos. 
5,608,148 and 5,602,321, describing cotton fiber-specific promoters and methods for the 
construction of transgenic cotton plants. Root-specific promoters may also be used to 
express the nucleic acids of the invention. Examples of root-specific promoters include 
the promoter firom the alcohol dehydrogenase gene (DeLisle (1990) Int. Rev. Cytol. 
123 :39-60). Other promoters that can be used to express the nucleic acids of the 
invention include, e.g., ovule-specific, embryo-specific, endosperm-specific, integument- 
specific, seed coat-specific promoters, or some combination thereof; a leaf-specific 
promoter (see, e.g., Busk (1997) Plant J. 11:1285 1295, describing a leaf-specific 
promoter in maize); the 0RF13 promoter from Agrobacterium rhizogenes (which exhibits 
high activity in roots, see, e.g., Hansen (1997) supra); a maize pollen specific promoter 
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(see, e.g., Guerrero (1990) Mol. Gen. Genet. 224:161 168); a tomato promoter active 
during fruit ripening, senescence and abscission of leaves and, to a lesser extent, of 
flowers can be used (see, e.g., Blume (1997) Plant J. 12:731 746); a pistil-specific 
promoter from the potato SK2 gene (see, e.g., Ficker (1997) Plant Mol. Biol. 35:425 
5 43 1); the Blec4 gene from pea, which is active in epidermal tissue of vegetative and floral 
shoot apices of transgenic alfalfa making it a useful tool to target the expression of 
foreign genes to the epidennal layer of actively growing shoots or fibers; the ovule- 
specific BELl gene (see, e.g., Reiser (1995) Cell 83:735-742, GenBankNo. U39944); 
and/or, the promoter m Klee, U.S. Patent No. 5,589,583, describing a plant promoter 
10 region is capable of conferring high levels of transcription in meristematic tissue and/or 
rapidly dividing cells. 

In one aspect, plant promoters which are mducible upon exposure to plant 
hormones, such as auxins, are used to express the nucleic acids of the mvention. For 
example, the invention can use the auxin-response elements El promoter fragment 
15 (AuxKEs) in the soybean {Glycine max L.) (Liu (1997) Plant Physiol. 1 15:397-407); the 
mxin'XQS^onsiy^Arabidopsis GST6 promoter (also responsive to salicylic acid and 
hydrogen peroxide) (Chen (1996) Plant J. 10: 955-966); the auxm-inducible parC 
promoter from tobacco (Sakai (1996) 37:906-913); a plant biotin response element (Streit 
(1997) Mol. Plant Microbe Interact. 10:933-937); and, the promoter responsive to the 
20 stress hormone abscisic acid (Sheen (1996) Science 274:1900-1902). 

The nucleic acids of the invention can also be operably linked to plant promoters 
which are inducible upon exposure to chemicals reagents which can be apphed to the 
plant, such as herbicides or antibiotics. For example, the maize In2-2 promoter, activated 
by benzenesulfonamide herbicide safeners, can be used (De Veylder (1997) Plant Cell 
25 Physiol. 38:568-577); appUcation of different herbicide safeners mduces distinct gene 
expression patterns, including expression in the root, hydathodes, and the shoot apical 
meristem. Codmg sequence can be under the control of, e.g., a tetracycUne-inducible 
promoter, e.g., as described with transgenic tobacco plants containing the Avena sativa L. 
(oat) arginine decarboxylase gene (Masgrau (1997) Plant J. 11:465-473); or, a salicyhc 
30 acid-responsive element (Stange (1997) Plant J. 11:1315-1324). Using chemically- (e.g., 
hormone- or pesticide-) mduced promoters, /.e., promoter responsive to a chemical which 
can be applied to the transgenic plant in the field, expression of a polypeptide of the 
invention can be induced at a particular stage of development of the plant. Thus, the 
invention also provides for transgenic plants containing an iuducible gene encoding for 
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polypeptides of the invention whose host range is limited to target plant species, such as 
com, rice, barley, soybean, tomato, wheat, potato or other crops, inducible at any stage of 
development of the crop. 

One of skill will recognize that a tissue-specific plant promoter may drive 
expression of operably linked sequences in tissues other than the target tissue. Thus, in 
one aspect, a tissue-specific promoter is one that drives expression preferentially in the 
target tissue or cell type, but may also lead to some expression in other tissues as well. 

The nucleic acids of the invention can also be operably linked to plant promoters 
which are inducible upon exposure to chemicals reagents. These reagents include, e.g., 
herbicides, synthetic auxins, or antibiotics which can be applied, e.g., sprayed, onto 
transgenic plants. Inducible expression of the cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme-producing nucleic acids of 
the invention will allow the grower to select plants with the optimal cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme expression 
and/or activity. The development of plant parts can thus controlled. In this way the 
invention provides the means to facilitate the harvesting of plants and plant parts. For 
example, in various embodiments, the maize In2-2 promoter, activated by 
benzenesulfonamide herbicide safeners, is used (De Veylder (1997) Plant Cell Physiol. 
38:568-577); application of different herbicide safeners induces distinct gene expression 
patterns, including expression in the root, hydathodes, and the shoot apical meristem. 
Coding sequences of the invention are also under the control of a tetracycline-inducible 
promoter, e.g., as described with transgenic tobacco plants containing IhGAvena sativa L. 
(oat) arginine decarboxylase gene (Masgrau (1997) Plant J. 11:465-473); or, a salicylic 
acid-responsive element (Stange (1997) Plant J. 11:1315-1324). 

In some aspects, proper polypeptide expression may require polyadenylation 
region at the 3'-end of the coding region. The polyadenylation region can be derived from 
the natural gene, from a variety of other plant (or animal or other) genes, or from genes in 
the Agrobacterial T-DNA, 

Expression vectors and cloning vehicles 

The invention provides expression vectors and cloning vehicles comprising 
nucleic acids of the invention, e.g., sequences encoding the cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzymes of the invention. 
Expression vectors and cloning vehicles of the invention can comprise viral particles, 
baculovirus, phage, plasmids, phagemids, cosmids, fosmids, bacterial artificial 



wo 2006/101584 



48 



PCT/US2006/002516 



5644620 1 4240/D2 1 50-2WO 

chromosomes, viral DNA (e.g., vaccinia, adenovirus, foul pox virus, pseudorabies and 
derivatives of SV40), PI -based artificial chromosomes, yeast plasmids, yeast artificial 
chromosomes, and any other vectors specific for specific hosts of interest (such as 
bacillus, Aspergillus and yeast). Vectors of the invention can include chromosomal, non- 

5 chromosomal and synthetic DNA sequences. Large numbers of suitable vectors are 

known to those of skill in the art, and are commercially available. Exemplary veptors are 
include: bacterial: pQE^^ vectors (Qiagen), pBLUESCRIPT™ plasmids, pNH vectors, 
(lambda-ZAP vectors (Stratagene); ptrc99a, pKK223-3, pDR540, pRIT2T (Pharmacia); 
Eukaryotic: pXTl, pSG5 (Stratagene), pSVK3, pBPV, pMSG, pSVLSV40 (Pharmacia). 

10 However, any other plasmid or other vector may be used so long as they are replicable 

and viable in the host. Low copy number or high copy number vectors may be employed 
with the present invention. "Plasmids" can be commercially available, publicly available 
on an unrestricted basis, or can be constructed fi-om available plasmids in accord with 
published procedures. Equivalent plasmids to those described herein are known in the art 

15 and will be apparent to the ordinarily skilled artisan. 

The expression vector can comprise a promoter, a ribosome binding site for 
translation initiation and a transcription terminator. The vector may also include 
appropriate sequences for amplifying expression. Manmaalian expression vectors can 
comprise an origin of replication, any necessary ribosome binding sites, a 

20 polyaidenylation site, spHce donor and acceptor sites, transcriptional termination 

sequences, and 5' flanking non-transcribed sequences. In some aspects, DNA sequences 
derived from the SV40 spUce and polyadenylation sites may be xised to provide the 
required non-transcribed genetic elements. 

In one aspect, the expression vectors contain one or more selectable marker genes 

25 to permit selection of host cells containing the vector. Such selectable markers include 
genes encoding dihydrofolate reductase or genes conferring neomycin resistance for 
eukaryotic cell culture, genes conferring tetracycline or ampicillin resistance in E, colU 
and the S. cerevisiae TRPl gene. Promoter regions can be selected from any deshed gene 
using chloraniphenicol transferase (CAT) vectors or other vectors with selectable 

30 markers. 

In one aspect, vectors for expressing the polypeptide or firagment thereof in 
eukaryotic ceUs contain enhancers to increase expression levels. Enhancers are cis-acting 
elements of DNA that can be from about 10 to about 300 bp in lengtii. They can act on a 
promoter to increase its transcription. Exemplary enhancers include the SV40 enhancer 



wo 2006/101584 



49 



PCT/US2006/002516 



564462014240/D2150-2WO 

on the late side of the replication origin bp 100 to 270, the cytomegalovirus early 
promoter enhancer, the polyoma enhancer on the late side of the replication origin, and 
the adenovirus enhancers. 

A nucleic acid sequence can be inserted into a vector by a variety of procedures. 
In general, the sequence is ligated to the desned position in the vector following digestion 
of the insert and the vector with appropriate restriction endonucleases. Alternatively, 
blunt ends in both the insert and the vector may be ligated. A variety of cloning 
techniques are known in the art, e.g., as described in Ausubel and Sambrook. Such 
procedures and others are deemed to be within the scope of those skilled in the art. 

The vector can be in the form of a plasmid, a viral particle, or a phage. Other 
vectors include chromosomal, non-chromosomal and synthetic DNA sequences, 
derivatives of S V40; bacterial plasmids, phage DNA, baculovirus, yeast plasmids, vectors 
derived from combinations of plasmids and phage DNA, viral DNA such as vaccinia, 
adenovirus, fowl pox vmis, and pseudorabies. A variety of cloning and expression 
vectors for use with prokaiyotic and eukaryotic hosts are described by, e.g., Sambrook. 

Particular bacterial vectors which can be used include the commercially available 
plasmids comprising genetic elements of the well known cloning vector pBR322 (ATCC 
37017), pKK223-3 (Pharmacia Fine Chemicals, Uppsala, Sweden), GEMl (Promega 
Biotec, Madison, WI, USA) pQE70, pQE60, pQE-9 (Qiagen), pDlO, psiX174 
pBLUESCRIPT n KS, pNH8 A, pNH16a, pNHl 8A, pNH46A (Stratagene), ptrc99a, 
pKX223-3, pKK233-3, DR540, pRIT5 (Pharmacia), pKK232-8 and pCM7. Particular 
eukaryotic vectors include pSV2CAT, pOG44, pXTl, pSG (Stratagene) pSVK3, pBPV, 
pMSG, and pSVL (Pharmacia). However, any other vector may be used as long as it is 
replicable and viable in the host cell. 

The nucleic acids of the, invention can be expressed in expression cassettes, 
vectors or viruses and transiently or stably expressed in plant cells and seeds. One 
exemplary transient expression system uses episomal expression systems, e.g., 
cauliflower mosaic virus (CaMV) viral RNA generated in the nucleus by transcription of 
an episomal mini-chromosome containing supercoiled DNA, see, e.g., Covey (1990) 
Proc, Natl. Acad. Sci. USA 87:1633-1637. Alternatively, coding sequences, i.e., all or 
sub-fragments of sequences of the invention can be inserted into a plant host cell genome 
becoming an integral part of the host chromosomal DNA. Sense or antisense transcripts 
can be expressed in this manner. A vector comprising the sequences (e.g., promoters or 
coding regions) from nucleic acids of the invention can comprise a marker gene that 
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confers a selectable phenotype on a plant cell or a seed. For example, the marker may 
encode biocide resistance, e.g., antibiotic resistance, such as resistance to kanamycin, 
G418, bleomycin, hygromycin, or herbicide resistance, such as resistance to 
chlorosulfuron or Basta. 

Expression vectors capable of expressing nucleic acids and proteins in plants are 
well known in the art, and can include, e.g., vectors from Agrobacterium spp., potato 
virus X (see, e.g., Angell (1997) EMBO J. 16:3675-3684), tobacco mosaic virus (see, 
e.g., Casper (1996) Gene 173:69-73), tomato bushy stunt virus (see, e.g., Hilhnan (1989) 
Virology 169:42-50), tobacco etch virus (see, e.g., Dolja (1997) Virology 234:243-252), 
bean golden mosaic virus (see, e.g., Morinaga (1993) Microbiol Immunol. 37:471-476), 
cauhflower mosaic virus (see, e.g., Cecchini (1997) Mol. Plant Microbe Interact. 
10:1094-1101), maize Ac/Ds transposable element (see, e.g., Rubin (1997) Mol. Cell. 
Biol. 17:6294-6302; Kunze (1996) Curr. Top. Microbiol. Inamunol. 204:161-194), and the 
maize suppressor-mutator (Spm) transposable element (see, e.g., Schlappi (1996) Plant 
Mol. Biol. 32:717-725); and derivatives thereof. 

In one aspect, the expression vector can have two replication systems to allow it to 
be maintained in two organisms, for example in mammalian or insect cells for expression 
and in a prokaryotic host for cloning and amplification. Furtliennore, for integrating 
expression vectors, the expression vector can contain at least one sequence homologous 
to the host cell genome. It can contain two homologous sequences which flank the 
expression construct. The integrating vector can be directed to a specific locus in the host 
cell by selecting the appropriate homologous sequence for inclusion in the vector. 
Constructs for integrating vectors are well known in the art. 

Expression vectors of the invention may also include a selectable marker gene to 
allow for the selection of bacterial strains that have been transformed, e.g., genes which 
render the bacteria resistant to drugs such as ampicillin, chloramphenicol, erythromycin, 
kanamycin, neomycin and tetracycline. Selectable markers can also include biosynthetic 
genes, such as those in the histidine, tryptophan and leucine biosynthetic pathways. 

The DNA sequence in the expression vector is operatively linked to an appropriate 
expression control sequence(s) (promoter) to direct RNA synthesis. Particular named 
bacterial promoters include lad, lacZ, T3, T7, gpt, lambda Pr, Pl and trp, Eukaryotic 
promoters include CMV immediate early, HSV thymidine kinase, early and late SV40, 
LTRs from rietrovirus and mouse metallothionein-I. Selection of the appropriate vector 
and promoter is well within the level of ordinary skill in the art. The expression vector 
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also contains a ribosome binding site for translation initiation and a transcription 
terminator. The vector may also include appropriate sequences for amplifying 
expression. Promoter regions can be selected from any desired gene using 
chloramphenicol transferase (CAT) vectors or other vectors with selectable markers. In 
addition, the expression vectors in one aspect contain one or more selectable marker 
genes to provide a phenotypic trait for selection of transformed host cells such as 
dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, or such as 
tetracycline or ampicillin resistance iaE, coH, 

Mammalian expression vectors may also comprise an origin of replication, any 
necessary ribosome binding sites, a polyadenylation site, splice donor and acceptor sites, 
transcriptional termination sequences and 5* flanking nontranscribed sequences. Li some 
aspects, DNA sequences derived from the SV40 splice and polyadenylation sites maybe 
used to provide the required nontranscribed genetic elements. 

Vectors for expressing the polypeptide or fragment thereof in eukaryotic cells may 
also contain enhancers to increase expression levels. Enhancers are cis-acting elements 
of DNA, usiially from about 10 to about 300 bp in length that act on a promoter to 
increase its transcription. Examples include the SV40 enhancer on the late side of the 
replication origin bp 100 to 270, the cytomegalovirus early promoter enhancer, the 
polyoma enhancer on the late side of the replication origin and the adenovirus enhancers. 

In addition, the expression vectors can contain one or more selectable marker 
genes to permit selection of host cells containing the vector. Such selectable markers 
include genes encoding dihydrofolate reductase or genes conferring neomycin resistance 
for eukaryotic cell culture, genes conferring tetracycline or ampicillin resistance in E. coli 
and the S. cerevisiae TRPl gene. 

In some aspects, the nucleic acid encoding one of the polypeptides of the 
invention, or fragments comprising at least about 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 
or 150 or more consecutive amino acids thereof is assembled in appropriate phase with a 
leader sequence capable of directing secretion of the translated polypeptide or fragment 
thereof. In one aspect, the nucleic acid can encode a fusion polypeptide in which one of 
the polypeptides of the invention, or fragments comprising at least 5, 10, 15, 20, 25, 30, 
35, 40, 50, 75, 100, or 150 or more consecutive amino acids thereof is ftised to 
heterologous peptides or polypeptides, such as N-terminal identification peptides which 
impart desired characteristics, such as increased stability or simplified purification. 
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The appropriate DNA sequence may be inserted into the vector by a variety of 
procedures. In general, the DNA sequence is ligated to the desu-ed position in the vector 
following digestion of the insert and the vector with appropriate restriction 
endonucleases. Alternatively, blunt ends in both the insert and the vector may be hgated. 
A variety of cloning techniques are disclosed in Ausubel et al Current Protocols in 
Molecular Biology, John Wiley 503 Sons, Inc. 1997 and Sambrook et al. Molecular 
Clomng: A Laboratory Manual 2nd Ed., Cold Spring Harbor Laboratory Press (1989. Such 
procedures and others are deemed to be within the scope of those skilled in the art. 

The vector may be, for example, in the form of a plasmid, a viral particle, or a 
phage. Otlier vectors include chromosomal, nonchromosomal and synthetic DNA 
sequences, derivatives of SV40; bacterial plasmids, phage DNA, baculovirus, yeast 
plasmids, vectors derived from combmations of plasmids and phage DNA, viral DNA 
such as vaccinia, adenovirus, fowl pox virus and pseudorabies. A variety of cloning and 
expression vectors for use with prokaryotic and eulcaryotic hosts are described by 
Sambrook, et al. Molecular Cloning: A Laboratory Manual, 2nd Ed., Cold Spring 
Harbor, N,Y., (1989). 

Host cells and transformed cells 

The invention also provides a transformed cell comprising a nucleic acid sequence 
of the invention, e.g., a sequence encoding a cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme of the invention, or a 
vector of the invention. The host cell may be any of the host cells familiar to those 
skilled in the art, including prokaryotic cells, eukaryotic cells, such as bacterial cells, 
fungal cells, yeast cells, mammalian cells, insect cells, or plant cells. Exemplary bacterial 
cells include any species of Streptomyces, Staphylococcus or Bacillus ^ or the exemplary 
species J?, coli. Bacillus subtilis, Bacillus cereus, Salmonella typhimurium. Exemplary 
insect cells include any species of Spodoptera or Drosophila, including Drosophila S2 
and Spodoptera SJ9. Exemplary animal cells include CHO, COS or Bowes melanoma or 
any mouse or human cell line. The selection of an appropriate host is within the abilities 
of those sldlled in the art. Techniques for transforming a wide variety of higher plant 
species are well known and described in the technical and scientific literature. See, e.g., 
Weising (1988) Aim. Rev. Genet. 22:421-477; U.S. Patent No. 5,750,870. 

The vector can be introduced into the host cells using any of a variety of 
techniques, including transformation, transfection, transduction, viral infection, gene 
guns, or Ti-mediated gene transfer. Particular methods include calcium phosphate 
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transfection, DEAE-Dextran mediated transfection, lipofection, or electroporation (Davis, 
L., Dibner, M., Battey, L, Basic Methods in Molecular Biology, (1986)). 

In one aspect, the nucleic acids or vectors of the invention are introduced into the 
cells for screening, thus, the nucleic acids enter the cells in a manner suitable for 

5 subsequent expression of the nucleic acid The method of mtroduction is largely dictated 
by the targeted cell type. Exemplary methods include CaP04 precipitation, liposome 
fusion, lipofection (e.g., LIPOFECTIN'^^'), electroporation, viral infection, etc. The 
candidate nucleic acids may stably integrate into the genome of the host cell (for 
example, with retroviral introduction) or may exist either transiently or stably in the 

10 cytoplasm (i.e. through the use of traditional plasmids, utilizing standard regulatory 

sequences, selection markers, etc.). As many pharmaceutically important screens require 
himian or model mammalian cell targets, retroviral vectors capable of transfecting such 
targets can be used. 

Where appropriate, the engineered host cells can be cultured in conventional 

15 nutrient media modified as appropriate for activating promoters, selecting transformants 
or amplifying the genes of the invention. Following transformation of a suitable host 
strain and growth of the host strain to an appropriate cell density, the selected promoter 
may be induced by appropriate means (e.g., temperature shift or chemical induction) and 
the cells may be cultured for an additional period to allow them to produce the desired 

20 polypeptide or fragment thereof. 

Cells can be harvested by centrifiigation, disrupted by physical or chemical 
means, and the resulting crude extract is retained for further purification. Microbial cells 
employed for expression of proteins can be disrupted by any convenient method, 
including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing 

25 agents. Such methods are well known to those skilled in the art. The expressed 

polypeptide or fragment thereof can be recovered and purified from recombinant cell 
cultures by methods including ammonium sulfate or ethanol precipitation, acid extraction, 
anion or cation exchange chromatogmphy, phosphoceUulose chromatography, 
hydrophobic interaction chromatography, alBBnity chromatography, hydroxylapatite 

30 chromatography and lectin chromatography. Protein refolding steps can be used, as 

necessary, in completing coniBguration of the polypeptide. If desired, high performance 
Uquid chromatography (HPLC) can be employed for final purification steps. 

The constructs in host cells can be used in a conventional manner to produce the 
gene product encoded by the recombmant sequence. Dependmg upon the host employed 
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in a recombinant production procedure, the polypeptides produced by host cells 
containing the vector may be glycosylated or may be non-glycosylated. Polypeptides of 
the invention may or may not also include an initial methionine amino acid residue. 

Cell-fii*ee translation systems can also be employed to produce a polypeptide of the 
invention. Cell-free translation systems can use mRNAs transcribed from a DNA 
construct comprising a promoter operably linked to a nucleic acid encoding the 
polypeptide or fi^agment thereof In some aspects, the DNA constmct may be linearized 
prior to conducting an in vitro transcription reaction. The transcribed mRNA is then 
incubated with an appropriate cell-free translation extract, such as a rabbit reticulocyte 
extract, to produce the desired polypeptide or fragment thereof 

The expression vectors can contain one or more selectable marker genes to 
provide a phenotypic trait for selection of transformed host cells such as dihydrofolate 
reductase or neomycin resistance for eukaiyotic cell culture, or such as tetracycline or 
ampiciUin resistance inE, colt. 

Host cells containing the polynucleotides of interest, e.g., nucleic acids of the 
invention, can be cultured in conventional nutrient media modified as appropriate for 
activating promoters, selecting transformants or amplifying genes. The culture 
conditions, such as temperature, pH and the like, are those previously used with the host 
cell selected for expression and will be apparent to the ordinarily skilled artisan. The 
clones which are identified as having die specified enzyme activity may then be 
sequenced to identify the polynucleotide sequence encoding an enzyme having the 
enhanced activity. 

The invention provides a method for overexpressing a recombinant cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme in a cell 
comprising expressmg a vector comprising a nucleic acid of the invention, e.g., a nucleic 
acid comprising a nucleic acid sequence with at least about 50%, 51%, 52%, 53%, 54%, 
55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 
70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 
85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 
more sequence identity to an exemplary sequence of the invention over a region of at 
least about 100 residues, wherein the sequence identities are determined by analysis with 
a sequence comparison algorithm or by visual inspection, or, a nucleic acid that 
hybridizes under stringent conditions to a nucleic acid sequence of the invention. The 
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overexpression can be ejffected by any means, e.g., use of a high activity promoter, a 
dicistronic vector or by gene amplification of the vector. 

The nucleic acids of the invention can be expressed, or overexpressed, in any in 
vitro or in vivo expression system. Any cell culture systems can be employed to express, 

5 or over-express, recombinant protein, including bacterial, insect, yeast, fungal or 
mammalian cultures, Over-expression can be effected by appropriate choice of 
promoters, enhancers, vectors (e.g., use of replicon vectors, dicistronic vectors (see, e.g., 
Gxutu (1996) Biochem. Biophys. Res. Commun. 229:295-8), media, culture systems and 
the like. In one aspect, gene amplification using selection markers, e.g., glutamine 

10 synthetase (see, e.g., Sanders (1987) Dev. Biol. Stand. 66:55-63), in cell systems are used 
to overexpress the polypeptides of the invention. The host cell may be any of the host 
cells familiar to those skilled in the art, including prokaryotic cells, eukaryotic cells, 
mammalian cells, insect ceUs, or plant cells. The selection of an appropriate host is 
within the abilities of those skilled in the art. 

15 The vector may be introduced into the host cells using any of a variety of 

techniques, including transformation, transfection, transduction, viral infection, gene guns, 
or Ti-mediated gene transfer. Particular methods include calcium phosphate transfection, 
DEAE-Dextran mediated transfection, lipofection, or electroporation (Davis, L., Dibner, 
M., Battey, I., Basic Methods in Molecular Biology, (1986)). 

20 Where appropriate, the engineered host cells can be cultured in conventional 

nutrient media modified as appropriate for activating promoters, selecting transformants 
or amplifying the genes of the invention. Following transformation of a suitable host 
strain and growth of the host strain to an appropriate cell density, the selected promoter 
may be induced by appropriate means {e.g., temperature shift or chemical induction) and 

25 the cells may be cultured for an additional period to allow them to produce the desired 
polypeptide or fragment thereof. 

Cells can be harvested by centrifiigation, disrupted by physical or chemical means 
and the resulting crude extract is retained for further purification. Microbial cells 
employed for expression of proteins can be disrupted by any convenient method, 

30 including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing 
agents. Such methods are well known to those skilled in the art. The expressed 
polypeptide or fragment thereof can be recovered and purified from recombinant cell 
cultures by methods including ammonium sulfate or ethanol precipitation, acid extraction, 
anion or cation exchange chromatography, phosphocellulose chromatography, 
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hydrophobic interaction chromatography, affinity chromatography, hydroxylapatite 
chromatography and lectin chromatography. Protein refolding steps can be used, as 
necessary, in completing configuration of the polypeptide. If desired, high performance 
liquid chromatography (HPLC) can be employed for final purification steps. 

Various mammaUan cell culture systems can also be employed to express 
recombinant protein. Examples of mammaUan expression systems include the COS-7 
lines of monkey kidney fibroblasts (described by Giuzman, Cell, 23:175, 1981) and other 
cell lines capable of expressing protems from a compatible vector, such as the C127, 3T3, 
CHO, HeLa and BHK cell Imes. 

The constructs in host cells can be used in a conventional manner to produce the 
gene product encoded by the recombinant sequence. Depending upon the host employed 
in a recombinant production procedure, the polypeptides produced by host cells 
containing the vector may be glycosylated or may be non-glycosylated. Polypeptides of 
the invention may or may not also include an initial methionine amino acid residue. 

Alternatively, the polypeptides of the invention, or fragments comprising at least 
5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 or more consecutive amino acids thereof 
can be synthetically produced by conventional peptide synthesizers, e.g., as discussed 
below. In other aspects, firagments or portions of the polypeptides may be employed for 
producing the corresponding full-length polypeptide by peptide synthesis; therefore, the 
fragments may be employed as intermediates for producing the full-length polypeptides. 

Cell-free translation systems can also be employed to produce one of the 
polypeptides of the invention, or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 
40, 50, 75, 100, or 150 or more consecutive amino acids thereof usmg mRNAs transcribed 
from a DNA construct comprising a promoter operably linked to a nucleic acid encoding 
the polypeptide or Segment thereof. In some aspects, the DNA construct may be 
linearized prior to conducting an in vitro transcription reaction. The transcribed mRNA is 
then incubated with an appropriate cell-free translation extract, such as a rabbit 
reticulocyte extract, to produce the desired polypeptide or fragment thereof. 

Amplification of Nucleic Acids 

In practicing the invention, nucleic acids of the invention and nucleic acids 
encoding the cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase enzymes of the invention, or modified nucleic acids of the invention, can be 
reproduced by amplification, e.g., PCR. Amplification can also be used to clone or 
modify the nucleic acids of the invention. Thus, the invention provides amplification 
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primer sequence pairs for amplifying nucleic acids of the invention. One of skill in the 
art can design amplification primer sequence pairs for any part of or the full length of 
these sequences. 

In one aspect, the invention provides a nucleic acid amplified by an amplification 
primer pah- of the invention, e.g., a primer pair as set forth by about the first (the 5') 12, 
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 or more residues of a nucleic acid of 
the invention, and about the first (the 5') 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 or 
more residues of the complementary strand. The invention provides amplification primer 
sequence pairs for amplifying a nucleic acid encoduig a polypeptide having a cellulase, 
e.g,, endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme 
activity, wherein the primer pair is capable of ampHfying a nucleic acid comprismg a 
sequence of the invention, or fragments or subsequences thereof. One or each member of 
the amplification primer sequence pair can comprise an oligonucleotide comprising at 
least about 10 to 50 or more consecutive bases of the sequence, or about 12, 13, 14, 15, 
16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 or more consecutive bases of the sequence. The 
invention provides amplification primer pairs, wherein the primer pair comprises a first 
member having a sequence as set forth by about the first (the 5') 12, 13, 14, 15, 16, 17, 
18, 19, 20, 21, 22, 23, 24, or 25 or more residues of a nucleic acid of the invention, and a 
second member having a sequence as set forth by about the first (the 5') 12, 13, 14, 15, 
16, 17, 18, 19, 20, 21, 22, 23, 24, or 25 or more residues of the complementary strand of 
the first member. 

The invention provides cellulase, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase enzymes generated by amplification, e.g., polymerase 
chain reaction (PGR), using an amplification primer pair of the invention. The invention 
provides methods of maldng a cellulase, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase enzyme by ampUfication, e.g., PGR, using an 
amplification primer pair of the uivention. In one aspect, the amplification primer pah- 
amplifies a nucleic acid &om a library, e.g., a gene library, such as an envkonmental 
libi'ary. 

Amplification reactions can also be used to quantify the amoxmt of nucleic acid in 
a sample (such as the amount of message in a cell sample), label the nucleic acid (e.g., to 
apply it to an array or a blot), detect the nucleic acid, or quantify the amount of a specific 
nucleic acid in a sample. In one aspect of the invention, message isolated from a cell or a 
cDNA library are amplified. 
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The skilled artisan can select and design suitable oligonucleotide amplification 
primers. Amplification methods are also well known in the art, and include, e.g., 
polymerase chain reaction, PGR (see, e.g., PGR PROTOCOLS, A GUIDE TO 
METHODS AND APPLICATIONS, ed. Innis, Academic Press, N.Y. (1990) and PGR 
STRATEGIES (1995), ed. Innis, Academic Press, Inc., N.Y., ligase chain reaction (LCR) 
(see, e.g., Wu (1989) Genomics 4:560; Landegren(1988) Science 241:1077; Barringer 
(1990) Gene 89: 1 17); transcription amplification (see, e.g., Kwoh (1989) Proc. Natl. 
Acad Sci. USA 86:1 173); and, self-sustained sequence repUcation (see, e.g., Guatelli 
(1990) Proc. Natl. Acad. Sci. USA 87:1874); Q Beta replicase amplification (see, e.g.. 
Smith (1997) J. Clin. Microbiol. 35:1477-1491), automated Q-beta replicase 
amplification assay (see, e.g.. Burg (1996) MoL Cell. Probes 10:257-271) and other RNA 
polymerase mediated techniques (e.g., NASBA, Cangene, Mississauga, Ontario); see also 
Berger (1987) Methods Enzymol. 152:307-316; Sambrook; Ausubel; U.S. Patent Nos. 
4,683,195 and 4,683,202; Sooknanan (1995) Biotechnology 13:563-564. 

Determining sequence identity in nucleic acids and polypeptides 

The invention proyides nucleic acids comprising sequences having at least about 
50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 
65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 
80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 
95%, 96%, 97%, 98%, 99%, or more, or complete (100%) sequence identity (homology) 
to an exemplary nucleic acid of the invention (see also Tables 1, 2, and 3, Examples 1 and 
4, below, and Sequence Listing) over a region of at least about 50, 75, 100, 150, 200, 250, 
300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 
1150, 1200, 1250, 1300, 1350, 1400, 1450, 1500, 1550 or more, residues. The invention 
provides polypeptides comprising sequences having at least about 50%, 51%>, 52%, 53%, 
54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 
69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 
84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 
99%, or more, or complete (100%) sequence identity to an exemplary polypeptide of the 
invention (see Tables 1, 2, and 3, Examples 1 and 4, below, and Sequence Listing). The 
extent of sequence identity (homology) may be determined using any computer program 
and associated parameters, including those described herein, such as BLAST 2,2.2. or 
FASTA version 3.0t78, with the default parameters. 
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Nucleic acid sequences of the invention can comprise at least 10, 15, 20, 25, 30, 
35, 40, 50, 75, 100, 150, 200, 300, 400, or 500 or more consecutive nucleotides of an 
exemplary sequence of the invention and sequences substantially identical thereto. 
Homologous sequences and fragments of nucleic acid sequences of the invention can 
refer to a sequence having at least about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 
58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 
73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 
88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more sequence 
identity (homology) to these sequences. Homology (sequence identity) may be 
determined using any of the computer programs and parameters described herein, 
including FASTA version 3.0t78 with the default parameters. Homologous sequences 
also include RNA sequences in which uridines replace the thymines in the nucleic acid 
sequences of the invention. The homologous sequences may be obtained using any of the 
procedures described herein or may result from the correction of a sequencing error. It 
will be appreciated that the nucleic acid sequences of the invention can be represented in 
the ti-aditional single character format (See the inside back cover of Stryer, Lubert. 
Biochemistry, 3rd Ed., W. H Freeman & Co., New York.) or in any other format which 
records the identity of the nucleotides in a sequence. 

In various aspects, sequence comparison programs identified herein are used in 
this aspect of the invention, i.e., to determine if a nucleic acid or polypeptide sequence is 
within the scope of the invention. However, protein and/or nucleic acid sequence 
identities (homologies) may be evaluated using any sequence comparison algorithm or 
program known in die art. Such algorithms and programs include, but are by no means 
limited to, TBLASTN, BLASTP, FASTA, TFASTA and CLUSTALW (see, e.g., Pearson 
and Lipman, Proc. Natl. Acad. Sci. USA 85(8):2444-2448, 1988; Altschul et aL, J. Mol. 
Biol. 215(3^403-410, 1990; Thompson Nucleic Acids Res. 22(2):4673-4680, 1994; 
Higgins et al. Methods EnzymoL 266:383-402, 1996; Altschul et aL, J. Mol. Biol. 
215(3):403-410, 1990; Altschul et al. Nature Genetics 3:266'211, 1993). 

In one aspect, homology or identity is measured using sequence analysis software 
{e^g y Sequence Analysis Software Package of the Genetics Computer Group, University 
of Wisconsin Biotechnology Center, 1710 Uiiiversity Avenue, Madison, WI 53705). 
Such software matches similar sequences by assigning degrees of homology to various 
deletions, substitutions and other modifications. In one aspect, the terms 'Tiomology" and 
"identity" in the context of two or more nucleic acids or polypeptide sequences, refer to 
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two or more sequences or subsequences that are the same or have a specified percentage 
of amino acid residues or nucleotides that are the same when compared and aUgned for 
maximum correspondence over a comparison window or designated region as measured 
using any number of sequence comparison algorithms or by manual alignment and visual 

5 inspection. In one aspect, for sequence comparison, one sequence acts as a reference 
sequence, to which test sequences are compared. When using a sequence comparison 
algorithm, test and reference sequences are entered into a computer, subsequence 
coordinates are designated, if necessary and sequence algorithm program parameters are 
designated. Default program parameters can be used, or alternative parameters can be 

10 designated. The sequence comparison algorithm then calculates the percent sequence 
identities for the test sequences relative to the reference sequence, based on the program 
parameters. 

A "comparison window", as used herein, includes reference to a segment of any 
one of the number of contiguous positions selected from the group consisting of from 20 

1 5 to 600, usually about 50 to about 200, more usually about 100 to about 150 in which a 
sequence may be compared to a reference sequence of the same number of contiguous 
positions after the two sequences are optimally aligned. Methods of alignment of 
sequence for comparison are well-known in the art. Optimal aligimient of sequences for 
comparison can be conducted, e.g., by the local homology algorithm of Smith & 

20 Waterman, Adv. Appl. Math. 2:482, 1981, by the homology alignment algorithm of 

Needleman & Wunsch, J. Mol. Biol 48:443, 1970, by the search for similarity method of 
person & Lipman, Proc. Nat'l. Acad. Sci. USA 85:2444, 1988, by computerized 
implementations of these algorithms (GAP, BESTFIT, FASTA and TFASTA in the 
Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., 

25 Madison, WI), or by manual alignment and visual inspection. Other algorithms for 

deterniining homology or identity mclude, for example, in addition to a BLAST program 
(Basic Local Alignment Search Tool at the National Center for Biological Information), 
ALIGN, AMAS (Analysis of Multiply Aligned Sequences), AMPS (Protein Multiple 
Sequence Alignment), ASSET (Aligned Segment Statistical Evaluation Tool), BANDS, 

30 BESTSCOR, BIOSCAN (Biological Sequence Comparative Analysis Node), BLIMPS 
(BLocks IMProved Searcher), FASTA, Intervals & Points, BMB, CLUSTAL V, 
CLUSTAL W, CONSENSUS, LCONSENSUS, WCONSENSUS, Smith-Waterman 
algorithm, DARWIN, Las Vegas algorithm, FNAT (Forced Nucleotide Alignment Tool), 
Framealign, Framesearch, DYNAMIC, FILTER, FSAP (Fiistensky Sequence Analysis 
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Package), GAP (Global AUgnnient Program), GENAL, GIBBS, GenQuest, ISSC 
(Sensitive Sequence Comparison), LALIGN (Local Sequence Alignment), LCP (Local 
Content Program), MACAW (Multiple Alignment Construction & Analysis Workbench), 
MAP (Multiple Alignment Program), MBLKP, MBLKN, PMA (Pattern-Induced Multi- 

5 sequence Alignment), SAGA (Sequence Alignment by Genetic Algorithm) and WHAT- 
EF. Such alignment programs can also be used to screen genome databases to identify 
polynucleotide sequences having substantially identical sequences. A number of genome 
databases are available, for example, a substantial portion of the human genome is 
available as part of the Human Genome Sequencing Project (Gibbs, 1995). At least 

10 twenty-one other genomes have already been sequenced, including, for example, M 
genitalium (Fraser et al, 1995), M. jannaschii (Bult et aL, 1996), H, influenzae 
(Fleischmann et aL, 1995), E, coli (Blattner et aL, 1997) and yeast (S. cerevisiae) (Mewes 
et aL, 1997) and D. melanogaster (Adams et aL, 2000). Significant progress has also 
been made in sequencing the genomes of model organism, such as mouse, C elegam and 

15 Arabadopsis sp. Several databases containing genomic information annotated with some 
functional information are maintained by different organizations and may be accessible 
via the internet. 

In one aspect, BLAST and BLAST 2.0 algorithms are used, which are described 
in Altschul et aL, Nuc. Acids Res. 25:3389-3402, 1977 and Altschul et aL, J. Mol. Biol. 

20 215:403-410, 1990, respectively. Software for performing BLAST analyses is publicly 
available through the National Center for Biotechnology Information. This algorithm 
involves first identifying high scoring sequence pairs (HSPs) by identifying short words 
of length W in the query sequence, which either match or satisfy some positive-valued 
threshold score T when aligned with a word of the same length in a database sequence. T 

26 is referred to as the neighborhood word score threshold (Altschul et aL, supra). These 
initial neighborhood word hits act as seeds for initiating searches to find longer HSPs 
containing them. The word hits are extended in both directions along each sequence for 
as far as the cumulative alignment score can be increased. Cimaulative scores are 
calculated using, for nucleotide sequences, the parameters M (reward score for a pair of 

30 matching residues; always >0). For amino acid sequences, a scoring matrix is used to 
calculate the cumulative score. Extension of the word hits in each direction are halted 
when: the cumulative alignment score falls off by the quantify X fi-om its maximum 
achieved value; the cumulative score goes to zero or below, due to the accimiulation of 
one or more negative-scoring residue alignments; or the end of either sequence is reached. 
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The BLAST algorithm parameters W, T and X determine the sensitivity and speed of the 
alignment. The BLASTN program (for nucleotide sequences) uses as defaults a 
wordlength (W) of 11, an expectation (E) of 10, M=5, N=-4 and a comparison of both 
strands. For amino acid sequences, the BLASTP program uses as defaults a wordlengtih 

5 of 3 and expectations (E) of 10 and the BLOSUM62 scoring matrix (see Henikoff & 

Henikoff, Proc. Natl Acad. Sci. USA 89:10915, 1989) alignments (B) of 50, expectation 
(E) of 10, M=5, N= -4 and a comparison of both strands. 

The BLAST algorithm also performs a statistical analysis of the similarity 
between two sequences (see, e.g., Karlin & Altschul, Proc. Natl. Acad. Sci. USA 

10 90:5873, 1993). One measure of similarity provided by BLAST algorithm is the smallest 
sum probability (P(N)), which provides an indication of the probability by which a match 
between two nucleotide or amino acid sequences would occur by chance. For example, a 
nucleic acid is considered similar to a references sequence if the smallest sum probability 
in a comparison of the test nucleic acid to the reference nucleic acid is less than about 0.2, 

15 more in one aspect less than about 0.01 and most in one aspect less than about 0.001. 

In one aspect, protein and nucleic acid sequence homologies are evaluated using 
the Basic Local Alignment Search Tool ("BLAST") In particular, five specific BLAST 
programs are used to perform the following task: 

(1) BLASTP and BLAST3 compare an amino acid query sequence 
20 against a protein sequence database; 

(2) BLASTN compares a nucleotide query sequence against a 
nucleotide sequence database; 

(3) BLASTX compares the six-frame conceptual translation products 
of a query nucleotide sequence (both strands) against a protein sequence database; 

25 (4) TBLASTN compares a query protein sequence against a nucleotide 

sequence database translated in all six reading frames (both strands); and 

(5) TBLASTX compares the six-frame translations of a nucleotide 
query sequence against the six-frame translations of a nucleotide sequence database. 
The BLAST programs identify homologous sequences by identifying similar 
30 segments, which are referred to herein as "high-scoring segment pairs," between a query 
amino or nucleic acid sequence and a test sequence which is in one aspect obtained from 
a protein or nucleic acid sequence database. High-scoring segment pairs are in one aspect 
identified (z. e, , aligned) by means of a scoring matrix, many of which are Icnovm in the 
art. In one aspect, the scoring matrix used is the BLOSLnV[62 matrix (Gonnet (1992) 
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Science 256:1443-1445; HenilcofFand Henikoff (1993) Proteins 17:49-61). Less in one 
aspect, the PAM or PAM250 matrices may also be used (see, e.g., Schwartz and Dayhoff, 
eds., 1978, Matrices for Detecting Distance Relationships: Atlas of Protein Sequence 
and Structure, Washington: National Biomedical Research Foundation). BLAST 
5 programs are accessible through the U.S. National Library of Medicine. 

The parameters used with the above algorithms may be adapted depending on the 
sequence length and degree of homology studied. In some aspects, the parameters may be 
tlie default parameters used by the algorithms in the absence of instmctions &om the user. 

Computer systems and computer program products 

10 The invention provides computers, computer systems, computer readable 

mediums, computer programs products and the like recorded or stored thereon the nucleic 
acid and polypeptide sequences of the invention. Additionally, in practicmg the methods 
of the invention, e.g., to determine and identify sequence identities (to determme whether 
a nucleic acid is within the scope of the invention), structural homologies, motifs and the 

15 like in silico, a nucleic acid or polypeptide sequence of the invention can be stored, 
recorded, and manipulated on any medium which can be read and accessed by a 
computer. 

As used herein, the words "recorded" and "stored" refer to a process for storing 
information on a computer medixmi. A skilled artisan can readily adopt any known 

20 methods for recording information on a computer readable medium to generate 

manufactures comprising one or more of the nucleic acid and/or polypeptide sequences of 
the invention. As used herem, the terms "computer," "computer program" and 
"processor" are used in their broadest general contexts and incorporate aU such devices, 
as described in detail, below. A "coding sequence of or a "sequence encodes" a 

25 particular polypeptide or protein, is a nucleic acid sequence which is transcribed and 
translated into a polypeptide or protein when placed under the control of appropriate 
regulatory sequences. 

The polypeptides of the invention include exemplary sequences of the invention 
and sequences substantially identical thereto, and subsequences (fragments) of any of the 

30 preceding sequences. In one aspect, substantially identical, or homologous, polypeptide 
sequences refer to a polypeptide sequence having at least 50%, 51%, 52%, 53%, 54%, 
55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 
70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 
85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 
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more, or complete (100%) sequence identity (homology) to an exemplary sequence of the 
invention. 

Homology (sequence identity) may be determined using any of the computer 
programs and parameters described herein. A nucleic acid or polypeptide sequence of the 
5 invention can be stored, recorded and manipulated on any medium which can be read and 
accessed by a computer. As used herein, the words "recorded" and "stored" refer to a 
process for storing information on a computer medium. A skilled artisan can readily 
adopt any of the presently known methods for recording information on a computer 
readable medium to generate manufactures comprising one or more of the nucleic acid 
1 0 sequences of the invention, one or more of the polypeptide sequences of the invention. 

Another aspect of the invention is a computer readable medium having recorded thereon 
at least 2, 5, 10, 15, or 20 or more nucleic acid or polypeptide sequences of the invention. 

Another aspect of the invention is a computer readable medixmi having recorded 
thereon one or more of the nucleic acid sequences of the invention. Another aspect of the 
1 5 invention is a computer readable medium having recorded thereon one or more of the 
polypeptide sequences of the invention. Another aspect of the invention is a computer 
readable medium having recorded thereon at least 2, 5, 10, 15, or 20 or more of the 
nucleic acid or polypeptide sequences as set forth above. 

Computer readable media include magnetically readable media, optically readable 
20 media, electronically readable media and magnetic/optical media. For example, the 

computer readable media may be a hard disk, a floppy disk, a magnetic tape, CD-ROM, 
Digital Versatile Disk (DVD), Random Access Memory (RAM), or Read Only Memory 
(ROM) as well as other types of other media known to those skilled in the art. 

Aspects of the invention include systems (e.g., internet based systems), e.g., 
25 computer systems which store and manipulate the sequence information described herem. 
One example of a computer system 100 is illustrated in block diagram form in Figure 1 . 
As used herein, "a computer system" refers to the hardware components, software 
components and data storage components used to analyze a nucleotide sequence of a 
nucleic acid sequence of the invention, or a polypeptide sequence of the invention. In one 
30 aspect, the computer system 100 includes a processor for processing, accessing and 
manipxdating the sequence data. The processor 105 can be any welHcnown type of 
central processing unit, such as, for example, the Pentiiun EI from Intel Corporation, or 
similar processor from Sun, Motorola, Compaq, AMD or International Business 
Machines. 
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In one aspect, the computer system 100 is a general piupose system that comprises 
the processor 105 and one or more mtemal data storage components 110 for storing data 
and one or more data retrieving devices for retrieving the data stored on the data storage 
components. A skilled artisan can readily appreciate that any one of the currently 

5 available computer systems are suitable. 

In one particular aspect, the computer system 100 includes a processor 105 
connected to a bus which is connected to a main memory 1 15 (in one aspect implemented 
as RAM) and one or more internal data storage devices 110, such as a hard drive and/or 
other computer readable media having data recorded thereon. In some aspects, the 

10 computer system 100 further includes one or more data retrieving device 1 18 for reading 
the data stored on the internal data storage devices 1 10. 

The data retrievmg device 118 may represent, for example, a floppy disk drive, a 
compact disk drive, a magnetic tape drive, or a modem capable of connection to a remote 
data storage system (e.g., via the internet) etc. In some aspects, the internal data storage 

15 device 1 10 is a removable computer readable medium such as a floppy disk, a compact 
disk, a magnetic tape, etc. containing control logic and/or data recorded thereon. The 
computer system 100 may advantageously include or be programmed by appropriate 
software for reading the control logic and/or the data from the data storage component 
once inserted in the data retrieving device. 

20 The computer system 100 includes a display 120 which is used to display output 

to a computer user. It shoxild also be noted that the computer system 100 can be linked to 
other computer systems 125a-c in a network or wide area network to provide centralized 
access to the computer system 100. 

Software for accessing and processing the nucleotide sequences of a nucleic acid 

25 sequence of the invention, or a polypeptide sequence of the invention, (such as search 
tools, compare tools and modeling tools etc.) may reside in main memory 115 during 
execution. 

In some aspects, the computer system 100 may further comprise a sequence 
comparison algorithm for comparing a nucleic acid sequence of the invention, or a 
30 polypeptide sequence of the invention, stored on a computer readable medium to a 

reference nucleotide or polypeptide sequence(s) stored on a computer readable medium. 
A "sequence comparison algorithm" refers to one or more programs which are 
implemented (locally or remotely) on the computer system 100 to compare a nucleotide 
sequence with other nucleotide sequences and/or compounds stored within a data storage 
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means. For example, the sequence comparison algorithm may compare the nucleotide 
sequences of a nucleic acid sequence of the invention, or a polypeptide sequence of the 
invention, stored on a computer readable medium to reference sequences stored on a 
computer readable medium to identify homologies or structural motifs. 

Figure 2 is a flow diagram illustrating one aspect of a process 200 for comparing a 
new nucleotide or protein sequence with a database of sequences in order to determine the 
homology levels between the new sequence and the sequences in the database. The 
database of sequences can be a private database stored within the computer system 100, 
or a public database such as GENBANK that is available through the Internet. 

The process 200 begins at a start state 201 and then moves to a state 202 wherein 
the new sequence to be compared is stored to a memory in a computer system 100. As 
discussed above, the memory could be any type of memory, including RAM or an 
internal storage device. 

The process 200 then moves to a state 204 wherein a database of sequences is 
opened for analysis and comparison. The process 200 then moves to a state 206 wherein 
the first sequence stored in the database is read into a memory on the computer. A 
comparison is then performed at a state 210 to determine if the first sequence is the same 
as the second sequence. It is important to note that this step is not limited to performing 
an exact comparison between the new sequence and the first sequence in the database. 
Well-known methods are known to those of skill in the art for comparing two nucleotide 
or protein sequences, even if they are not identical For example, gaps can be introduced 
into one sequence in order to raise the homology level between the two tested sequences. 
The parameters that control whether gaps or other features are introduced into a sequence 
during comparison are normally entered by the user of the computer system. 

Once a comparison of the two sequences has been performed at the state 210, a 
determination is made at a decision state 210 whether the two sequences are the same. Of 
course, the term "same" is not limited to sequences that are absolutely identical. 
Sequences that are within the homology parameters entered by the user will be marked as 
"same" in the process 200. 

If a determination is made that the two sequences are the same, the process 200 
moves to a state 214 wherein the name of the sequence from the database is displayed to 
the user. This state notifies the user that the sequence with the displayed name fulfills the 
homology constraints that were entered. Once the name of the stored sequence is 
displayed to the user, the process 200 moves to a decision state 218 wherein a 
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determination is made whether more sequences exist in the database. If no more 
sequences exist in the database, then the process 200 terminates at an end state 220. 
However, if more sequences do exist in the database, then the process 200 moves to a 
state 224 wherein a pointer is moved to the next sequence in tlie database so that it can be 

5 compared to the new sequence. In this manner, the new sequence is ahgned and 
compared with every sequence in the database. 

It should be noted that if a determination had been made at die decision state 212 
that the sequences were not homologous, then the process 200 would move immediately 
to the decision state 218 in order to determine if any other sequences were available in the 

10 database for comparison. 

Accordingly, one aspect of the invention is a computer system comprising a 
processor, a data storage device having stored thereon a nucleic acid sequence of the 
invention, or a polypeptide sequence of the invention, a data storage device having 
retrievably stored thereon reference nucleotide sequences or polypeptide sequences to be 

15 compared to a nucleic acid sequence of the invention, or a polypeptide sequence of the 
invention and a sequence comparer for conducting the comparison. The sequence 
comparer may indicate a homology level between the sequences compared or identify 
structural motifs in the above described nucleic acid code a nucleic acid sequence of the 
invention, or a polypeptide sequence of the invention, or it may identify structural motifs in 

20 sequences which are compared to these nucleic acid codes and polypeptide codes. In 

some aspects, the data storage device may have stored thereon the sequences of at least 2, 
5, 10, 15, 20, 25, 30 or 40 or more of the nucleic acid sequences of the invention, or the 
polypeptide sequences of the invention. 

Another aspect of the invention is a method for determining the level of homology 

25 between a nucleic acid sequence of the invention, or a polypeptide sequence of the 

invention and a reference nucleotide sequence. The method including reading the nucleic 
acid code or the polypeptide code and the reference nucleotide or polypeptide sequence 
through the use of a computer program which determines homology levels and 
determining homology between the nucleic acid code or polypeptide code and the 

30 reference nucleotide or polypeptide sequence with the computer program. The computer 
program may be any of a number of computer programs for determming homology levels, 
including those specifically enumerated herein, (eg., BLAST2N with the default 
parameters or with any modified parameters). The method may be implemented using the 
computer systems described above. The method may also be performed by reading at 



wo 2006/101584 



68 



PCT/US2006/002516 



564462014240/D2150-2WO 



least 2, 5, 10, 15, 20, 25, 30 or 40 or more of the above described nucleic acid sequences 
of the invention, or the polypeptide sequences of the invention through use of the 
computer program and determining homology between the nucleic acid codes or 
polypeptide codes and reference nucleotide sequences or polypeptide sequences, 

5 Figure 3 is a flow diagram illustrating one aspect of a process 250 in a computer 

for determining whether two sequences are homologous. The process 250 begins at a start 
state 252 and then moves to a state 254 wherein a first sequence to be compared is stored 
to a memory. The second sequence to be compared is then stored to a memory at a state 
256. The process 250 then moves to a state 260 whereia the first character in the first 

10 sequence is read and then to a state 262 wherein the first character of the second sequence 
is read. It should be understood that if the sequence is a nucleotide sequence, then the 
character would normally be either A, T, C, G or U. If the sequence is a protein 
sequence, then it is in one aspect in the single letter amino acid code so that the first and 
sequence sequences can be easily compared. 

16 A determination is then made at a decision state 264 whether the two characters 

are the same. If they are the same, then the process 250 moves to a state 268 wherein the 
next characters in the first and second sequences are read. A determination is then made 
whether the next characters are the same. If they are, then the process 250 continues this 
loop until two characters are not the same. If a determination is made that the next two 

20 characters are not the same, the process 250 moves to a decision state 274 to determine 
whether there are any more characters either sequence to read. 

If there are not any more characters to read, then the process 250 moves to a state 
276 wherein the level of homology between the first and second sequences is displayed to 
the user. The level of homology is deteimined by calculating the proportion of characters 

25 between the sequences that were the same out of the total number of sequences in the first 
sequence. Thus, if every character in a first 100 nucleotide sequence aHgned with a every 
character in a second sequence, the homology level would be 100%. 

Alternatively, the computer program may be a computer program which compares 
the nucleotide sequences of a nucleic acid sequence as set forth in the invention, to one or 

30 more reference nucleotide sequences in order to determine whether the nucleic acid code 
of the invention, differs fi:om a reference nucleic acid sequence at one or more positions. 
Optionally such a program records the length and identity of inserted, deleted or 
substituted nucleotides with respect to the sequence of either the reference polynucleotide 
or a nucleic acid sequence of the invention. In one aspect, the computer program may be 
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a program which determines whether a nucleic acid sequence of the invention, contams a 
single nucleotide polymorphism (SNP) with respect to a reference nucleotide sequence. 

Accordingly, another aspect of the invention is a method for determining whetiier 
a nucleic acid sequence of the invention, differs at one or more nucleotides from a 
reference nucleotide sequence comprisLag the steps of reading the nucleic acid code and 
the reference nucleotide sequence through use of a computer program which identifies 
differences between nucleic acid sequences and identifying differences between the 
nucleic acid code and the reference nucleotide sequence with the computer program. In 
some aspects, the computer program is a program which identifies single nucleotide 
polymorphisms. The method may be implemented by the computer systems described 
above and the method illustrated in Figure 3. The method may also be performed by 
reading at least 2, 5, 10, 15, 20, 25, 30, or 40 or more of the nucleic acid sequences of the 
invention and the reference nucleotide sequences through the use of the computer 
program and identifying differences between the nucleic acid codes and the reference 
nucleotide sequences with the computer program. 

In other aspects the computer based system may further comprise an identifier for 
identifying features within a nucleic acid sequence of the invention or a polypeptide 
sequence of the invention. An "identifier" refers to one or more programs which identifies 
certain features within a nucleic acid sequence of the invention, or a polypeptide sequence 
of the invention. In one aspect, the identifier may comprise a program which identifies an 
open reading frame in a nucleic acid sequence of the invention. 

Figure 4 is a flow diagram illustrating one aspect of an identifier process 300 for 
detecting the presence of a feature in a sequence. The process 300 begins at a start state 
302 and then moves to a state 304 wherein a first sequence that is to be checked for 
features is stored to a memory 115 in the computer system 100. The process 300 then 
moves to a state 306 wherein a database of sequence features is opened. Such a database 
would include a list of each feature's attributes along with the name of the feature. For 
example, a feature name could be "Initiation Codon" and the attribute would be "ATG". 
Another example would be the feature name "TAATAA Box" and the feature attribute 
would be "TAATAA". An example of such a database is produced by the University of 
Wisconsin Genetics Computer Group. Alternatively, the features may be structural 
polypeptide motifs such as alpha helices, beta sheets, or functional polypeptide motifs 
such as enzymatic active sites, helix-tum-helix motifs or other motifs known to those 
skilled in the art. 
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Once the database of features is opened at the state 306, the process 300 moves to 
a state 308 wherein the first feature is read from the database. A comparison of the 
attribute of the first feature with the first sequence is then made at a state 310. A 
determination is then made at a decision state 316 whether the attribute of the feature was 

5 found in the first sequence. If the attribute was found, then the process 300 moves to a 
state 3 1 8 wherein the name of the found feature is displayed to the user. 

The process 300 then moves to a decision state 320 wherein a determination is 
made whether move features exist in the database. If no more features do exist, then the 
process 300 terminates at an end state 324. However, if more features do exist in the 

10 database, then the process 300 reads the next sequence feature at a state 326 and loops 

back to the state 310 wherein the attribute of the next feature is compared against the first 
sequence. It should be noted, that if the feature attribute is not found in the first sequence 
at the decision state 3 16, the process 300 moves directly to the decision state 320 in order 
to determine if any more features exist in the database. 

15 Accordingly, another aspect of the invention is a method of identifying a feature 

within a nucleic acid sequence of the invention, or a polypeptide sequence of the invention, 
comprising reading the nucleic acid code(s) or polypeptide code(s) through the use of a 
computer program which identifies features therein and identifying features within the 
nucleic acid code(s) with the computer program. In one aspect, computer program 
■ 20 comprises a computer program which identifies open reading fi:Bmes. The method may 
be performed by reading a single sequence or at least 2, 5, 10, 1 5, 20, 25, 30, or 40 or more 
of the nucleic acid sequences of the invention, or the polypeptide sequences of the 
invention, through the use of the computer program and identifying features within the 
nucleic acid codes or polypeptide codes with the computer program, 

25 A nucleic acid sequence of the invention, or a polypeptide sequence of the 

invention, may be stored and manipulated in a variety of data processor programs in a 
variety of formats. For example, a nucleic acid sequence of the invention, or a 
polypeptide sequence of the invention, may be stored as text in a word processing file, 
such as Microsoft WORD™ or WORDPERFECTtm or as an ASCII file in a variety of 

30 database programs familiar to those of skill in the art, such as DB2™, SYBASE™, or 
ORACLE™. In addition, many computer programs and databases may be used as 
sequence comparison algorithms, identifiers, or sources of reference nucleotide sequences 
or polypeptide sequences to be compared to a nucleic acid sequence of the invention, or a 
polypeptide sequence of the invention. The following list is intended not to limit the 
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invention but to provide guidance to programs and databases which are useful with the 
nucleic acid sequences of the invention, or the polypeptide sequences of the invention. 

The programs and databases which may be used include, but are not limited to: 
MACPATTERlsFM (EMBL), DISCOVERYBASE^m (Molecular Applications Group), 
5 GENEMINE'T'^ (Molecular Applications Group), LOOK™ (Molecular Applications 

Group), MACLOOK™ (Molecular Applications Group), BLAST and BLAST2 (NCBI), 
BLASTN and BLASTX (Altschul et al, J. Mol. Biol. 215: 403, 1990), FASTA (Pearson 
and Lipman, Proc. Natl. Acad. Sci. USA, 85: 2444, 1988), FASTDB (Brutlag et al 
Comp, App. Biosci. 6:237-245, 1990), CATALYSTt^* (Molecular Sunulations Inc.), 
10 Catalyst/SHAPE™ (Molecular Simulations Inc.), Cerius^.DBAccess™ (Molecular 
Simulations Inc.), HYPOGEN™ (Molecular Simulations Inc.), INSIGHT II™, 
(Molecular Sunulations Inc.), DISCOVER™ (Molecular Sunulations Inc.), CHARMm™ 
(Molecular Simulations Lie), FELDC™ (Molecular Simulations Inc.), DELPHI™, 
(Molecular Simulations Inc.), QuanteMM'*"'^, (Molecular Simulations Inc.), Homology 
1 5 (Molecular Simulations Inc.), MODELER™ (Molecular Simulations Inc.), ISIS™ 
(Molecular Simulations Inc.), Quanta/Protem Design (Molecular Simulations Inc.), 
WebLab (Molecular Simulations Inc.), WebLab Diversity Explorer (Molecular 
Simulations Inc.), Gene Explorer (Molecular Sunulations Inc.), SeqFold (Molecular 
Simulations Inc.), the MDL Available Chemicals Directory database, the MDL Drug Data 
20 Report data base, the Comprehensive Medicinal Chemistry database, Derwents's World 
Drug Index database, the BioByteMasterFile database, the Genbank database and the 
Genseqn database. Many other programs and data bases would be apparent to one of skill 
in the art given the present disclosine. 

Motifs which may be detected using the above programs mclude sequences 
25 encoding leucine zippers, helix-tum-helix motifs, glycosylation sites, ubiquitination sites, 
alpha helices and beta sheets, signal sequences encoding signal peptides which direct the 
secretion of the encoded proteins, sequences implicated in transcription regulation such as 
homeoboxes, acidic stretches, en2ymatic active sites, substrate binding sites and 
enzymatic cleavage sites. 

30 Hybridization of nucleic acids 

The invention provides isolated or recombmant nucleic acids that hybridize under 
stringent conditions to an exemplary sequence of the mvention (e.g., SEQ ID NO:l, SEQ 
ID NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID N0:9, SEQ ID NO: 11, SEQ ID NO:13, 
SEQ ID N0:15, SEQ ID NO:17, SEQ ID N0:19, SEQ ID N0:21, SEQ ID NO:23, SEQ 
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ro NO:25, SEQ ID NO:27, SEQ ID NO:29, SEQ ID NO:31, SEQ ID NO:33, SEQ ID 
NO:35, SEQ ID NO:37, SEQ ID NO:39, SEQ ID N0:41, SEQ ID NO:43, SEQ ID 
NO:45, SEQ ID NO:47, SEQ ID NO:49, SEQ ID NO:5 1, SEQ ID NO:53, SEQ ID 
NO:55, SEQ ID NO:57, SEQ ID NO:59, SEQ ID N0:61, SEQ ID NO:63, SEQ ID 

5 NO:65, SEQ ID NO:67, SEQ ID NO:69, SEQ ID N0:71, SEQ ID NO:73, SEQ ID 
NO:75, SEQ ID NO:77, SEQ ID NO:79, SEQ ID N0:81, SEQ ID NO:83, SEQ ED 
NO:85, SEQ ID NO:87, SEQ ID NO:89, SEQ ID N0:91, SEQ ID NO:93, SEQ ID 
NO:95, SEQ ID NO:97, SEQ ID NO:99, SEQ ID NO: 101, SEQ ID NO: 103, SEQ ID 
NO: 105, SEQ ID NO: 107, SEQ ID NO: 109, SEQ K) NO: 111, SEQ ID NO: 113, SEQ ID 

10 N0:115, SEQIDNO:117, SEQIDN0:119, SEQIDN0:121, SEQIDNO:123, SEQ ID 
NO: 125, SEQ ID NO: 127, SEQIDNO:129, SEQIDN0:131, SEQIDNO:133, SEQ ID 
NO:135, SEQ ID NO:137, SEQ IDNO:139, SEQ IDN0:141, SEQ IDNO:143, SEQ ID 
NO: 145, SEQ ID NO: 147, SEQ ID NO: 149, SEQ ID NO: 15 1, SEQ ID NO: 1 53, SEQ ID 
NO:155, SEQ IDNO:157, SEQIDNO:159, SEQIDN0:161, SEQIDNO:163 or SEQ 

15 ID NO: 165 (see also Tables 1, 2, and 3, Examples 1 and 4, below, and Sequence 

Listing)). The stringent conditions can be highly stringent conditions, medium stringent 
conditions and/or low stringent conditions, including the high and reduced stringency 
conditions described herein. In one aspect, it is the stringency of the wash conditions that 
set forth the conditions which determine whether a nucleic acid is within the scope of the 

20 invention, as discussed below, 

"Hybridization" refers to the process by which a nucleic acid strand joms with a 
complementary strand through base pairing. Hybridization reactions can be sensitive and 
selective so that a particular sequence of iuterest can be identified even in samples in 
which it is present at low concentrations. Suitably stringent conditions can be defined by, 

25 for example, the concentrations of salt or formamide in the prehybridization and 

hybridization solutions, or by the hybridization temperature and are well known in the art 
In alternative aspects, stringency can be increased by reducing the concentration of salt, 
increasing the concentration of formamide, or raisrug the hybridization temperature. In 
alternative aspects, nucleic acids of the invention are defined by their ability to hybridize 

30 under various stringency conditions (e.g., high, medium, and low), as set forth herein. 

In one aspect, hybridization under high stringency conditions comprise about 50% 
formamide at about 37^C to 42^C. In one aspect, hybridization conditions comprise 
reduced stringency conditions in about 35% to 25% formamide at about 30°C to 35''C. In 
one aspect, hybridization conditions comprise high stringency conditions, e.g., at 42^*0 in 
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50% fonnamide, 5X SSPE, 0.3% SDS and 200 n/nil sheared and denatured salmon sperm 
DNA. In one aspect, hybridization conditions comprise these reduced stringency 
conditions, but in 35% formamide at a reduced temperature of 35°C. The temperature 
range corresponding to a particular level of stringency can be further narrowed by 
5 calculating the purine to pyrimidine ratio of the nucleic acid of interest and adjusting the 
temperature accordingly. Variations on the above ranges and conditions are well known 
in the art. 

In alternative aspects, nucleic acids of the invention as defined by their ability to 
hybridize xmder stringent conditions can be between about five residues and the fiill 

10 length of nucleic acid of the invention; e.g., they can be at least 5, 10, 15, 20, 25, 30, 35, 
40, 50, 55, 60, 65, 70, 75, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 
650, 700, 750, 800, 850, 900, 950, 1000, or more, residues in length. Nucleic acids 
shorter than full length are also mcluded. These nucleic acids can be useful as, e.g., 
hybridization probes, labeling probes, PGR oligonucleotide probes, siRNA or miRNA 

15 (single or double stranded), antisense or sequences encoding antibody binding peptides 
(epitopes), motifs, active sites and the like. 

In one aspect, nucleic acids of the invention are defined by their ability to 
hybridize xmder high stringency comprises conditions of about 50% formamide at about 
3TC to 42°C. In one aspect, nucleic acids of the invention are defijied by theh ability to 

20 hybridize under reduced stringency comprising conditions in about 35% to 25% 
formamide at about 30°C to 35°C. 

Alternatively, nucleic acids of the invention are defined by their ability to 
hybridize xmder high stringency comprising conditions at 42°C in 50% formamide, 5X 
SSPE, 0,3% SDS, and a repetitive sequence blocking nucleic acid, such as cot-1 or 

25 salmon sperm DNA (e.g., 200 n/ml sheared and denatured salmon sperm DNA). In one 
aspect, nucleic acids of the invention are defined by their ability to hybridize xmder 
reduced stringency conditions comprising 35% or 40% fonnamide at a reduced 
temperature of 35°C or 42°C. 

In nucleic acid hybridization reactions, the conditions used to achieve a particxilar 

30 level of stringency will vary, depending on the natiu-e of the nucleic acids being 

hybridized. For example, the length, degree of complementarity, nucleotide sequence 
composition (e.g., GC v. AT content) and nucleic acid type (e.g., RNA v. DNA) of the 
hybridizing regions of the nucleic acids can be considered in selecting hybridization 
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conditions. An additional consideration is whether one of the nucleic acids is 
immobilized, for example, on a filter. 

Hybridization may be carried out under conditions of low stringency, moderate 
stringency or high stringency. As an example of nucleic acid hybridization, a polymer 

5 membrane containing immobilized denatured nucleic acids is first prehybridized for 30 
minutes at 45''C in a solution consisting of 0.9 M NaCl, 50 mM NaH2P04, pH 7.0, 5.0 
mM Na2EDTA, 0.5% SDS, lOX Denhardt's and 0.5 mg/ml polyriboadenylic acid. 
Approximately 2 X 10^ cpm (specific activity 4-9 X 10^ cpm/ug) of ^^P end-labeled 
oligonucleotide probe are then added to the solution. After 12-16 hours of incubation, the 

10 membrane is washed for 30 minutes at room temperature in IX SET (1 50 mM NaCl, 20 
mM Tris hydrochloride, pH 7.8, 1 mM NaiEDTA) containing 0.5% SDS, followed by a 
30 minute wash in firesh IX SET at Tm-10°C for the oligonucleotide probe. The 
membrane is then exposed to auto-radiographic film for detection of hybridization 
signals. All of the foregoing hybridizations would be considered to be under conditions 

15 of high stringency. 

Following hybridization, a filter can be washed to remove any non-specifically 
bound detectable probe. The stringency used to wash the filters can also be varied 
depending on the nature of the nucleic acids being hybridized, the length of the nucleic 
acids being hybridized, the degree of complementarity, the nucleotide sequence 

20 composition (e.g., GC v. AT content) and the nucleic acid type {e.g., RNA v. DNA). 

Examples of progressively higher stringency condition washes are as follows: 2X SSC, 
0.1% SDS at room temperature for 15 minutes (low stringency); O.IX SSC, 0.5% SDS at 
room temperature for 30 minutes to 1 hour (moderate stringency); O.IX SSC, 0.5% SDS 
for 15 to 30 minutes at between the hybridization temperature and 68°C (high 

25 stiingency); and 0. 1 5M NaCl for 1 5 minutes at 72°C (very high stringency), A final low 
stringency wash can be conducted in O.IX SSC at room temperature. The examples 
above are merely illustrative of one set of conditions that can be used to wash filters. One 
of slcill in the art would know that there are niunerous recipes for different stringency 
washes. Some other examples are given below. 

30 In one aspect, hybridization conditions comprise a wash step comprising a wash 

for 30 minutes at room temperature in a solution comprising IX 150 mM NaCl, 20 mM 
Tris hydrochloride, pH 7.8, 1 mM Na2EDTA, 0.5% SDS, followed by a 30 minute wash 
in firesh solution. 



wo 2006/101584 



75 



PCT/US2006/002516 



564462014240/D2150-2WO 

Nucleic acids which have hybridized to the probe are identified by 
autoradiography or other conventional techniques. 

The above procedures may be modified to identify nucleic acids having 
decreasing levels of sequence identity (homology) to the probe sequence. For example, 

5 to obtain nucleic acids of decreasing sequence identity (homology) to the detectable 

probe, less stringent conditions may be used. For example, the hybridization temperature 
may be decreased in increments of S'^C from 68°C to 42°C in a hybridization buffer 
having a Na+ concentration of approximately IM. Following hybridization, the filter 
may be washed with 2X SSC, 0.5% SDS at the temperature of hybridization. These 

10 conditions are considered to be "moderate" conditions above 50^C and "low" conditions 
below 50°C. A specific example of "moderate" hybridization conditions is when the 
above hybridization is conducted at 55°C. A specific example of "low stringency" 
hybridization conditions is when the above hybridization is conducted at 45°C, 

Alternatively, the hybridization may be carried out in buffers, such as 6X SSC, 

15 containing formamide at a temperature of 42°C. In this case, the concentration of 

formamide in the hybridization buffer may be reduced in 5% increments fi*om 50% to 0% 
to identify clones having decreasing levels of homology to the probe. Following 
hybridization, the filter may be washed with 6X SSC, 0.5% SDS at 50°C. These 
conditions are considered to be "moderate" conditions above 25% formamide and "low" 

20 conditions below 25% formamide. A specific example of "moderate" hybridization 

conditions is when the above hybridization is conducted at 30% formamide. A specific 
example of "low stringency" hybridization conditions is when the above hybridization is 
conducted at 10% formamide. 

However, the selection of a hybridization format may not be critical - it is the 

25 stringency of the wash conditions that set forth the conditions which determine whether a 
nucleic acid is within the scope of the invention. Wash conditions used to identify 
nucleic acids within the scope of the invention include, e.g.: a salt concentration of about 
0.02 molar at pH 7 and a temperature of at least about SQ^'C or about 55*'C to about 60°C; 
ori a salt concentration of about 0. 1 5 M NaCl at 72*'C for about 1 5 minutes; or, a salt 

30 concentration of about 0,2X SSC at a temperature of at least about 50°C or about 55°C to 
about eO^'C for about 15 to about 20 minutes; or, the hybridization complex is washed 
twice with a solution with a salt concentration of about 2X SSC containing 0.1% SDS at 
room temperature for 15 mmutes and tlien washed twice by O.IX SSC containing 0.1% 
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SDS at 68oC for 15 minutes; or, equivalent conditions. See Sambrook, Tijssen and 
Ausubel for a description of SSC buffer and equivalent conditions. 

These methods may be used to isolate or identify nucleic acids of the invention. 
For example, the preceding methods may be used to isolate or identify nucleic acids 

5 having a sequence with at least about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 
59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 
74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 
89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more sequence identity 
(homology) to a nucleic acid sequence selected from the group consisting of one of the 

10 sequences of the invention, or fragments comprising at least about 10, 15, 20, 25, 30, 35, 
40, 50, 75, 100, 150, 200, 300, 400, or 500 consecutive bases thereof and the sequences 
complementary thereto. Sequence identity (homology) may be measured using the 
alignment algorithm. For example, the homologous polynucleotides may have a coding 
sequence which is a naturally occurring allelic variant of one of the coding sequences 

15 described herein. Such alleUc variants may have a substitution, deletion or addition of 
one or more nucleotides when compared to the nucleic acids of the invention. 
Additionally, the above procedures may be used to isolate nucleic acids which encode 
polypeptides having at least about 99%, 95%, at least 90%, at least 85%, at least 80%, at 
least 75%, at least 70%, at least 65%, at least 60%, at least 55%, or at least 50% sequence 

20 identity (homology) to a polypeptide of the invention, or fragments comprising at least 5, 
10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids thereof as 
determined using a sequence ahgnment algorithm {e.g., such as the FASTA version 
3.0t78 algorithm with the default parameters). 

Oligonucleotides probes and methods for using them 

25 The invention also provides nucleic acid probes that can be used, e.g., for 

identifying, ampUfymg, or isolatmg nucleic acids encoding a polypeptide having a 
cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
enzyme activity or fragments thereof or for identifying cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme genes. In one aspect, the 

30 probe comprises at least about 10 consecutive bases of a nucleic acid of the invention. 
Alternatively, a probe of the invention can be at least about 5, 6, 7, 8, 9, 10, 11, 12, 13, 
14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 110, 
120, 130, 150 or about 10 to 50, about 20 to 60 about 30 to. 70, consecutive bases of a 
sequence as set forth in a nucleic acid of the invention. The probes identify a nucleic acid 
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by binding and/or hybridization. The probes can be used in arrays of the invention, see 
discussion below, includmg, e.g., capillary arrays, The probes of the invention can also 
be used to isolate other nucleic acids or polypeptides. 

The isolated or recombinant nucleic acids of the uivention, the sequences 

5 complementary thereto, or a fragment comprising at least about 10, 15, 20, 25, 30, 35, 40, 
50, 75, 100, 150, 200, 300, 400, or 500 consecutive bases of one of the sequences of the 
invention, or the sequences complementary thereto may also be used as probes to 
determine whether a biological sample, such as a soil sample, contains an organism 
having a nucleic acid sequence of the invention or an organism from which the nucleic 

10 acid was obtained. In such procedures, a biological sample potentially harboring the 
organism from which the nucleic acid was isolated is obtained and nucleic acids are 
obtained from the sample. The nucleic acids are contacted with the probe under 
conditions which permit the probe to specifically hybridize to any complementary 
sequences from which are present therein. 

15 Where necessary, conditions which permit the probe to specifically hybridize to 

complementary sequences may be determined by placing the probe in contact with 
complementary sequences from samples known to contain the complementary sequence 
as well as control sequences which do not contain the complementary sequence. 
Hybridization conditions, such as the sah concentration of the hybridization buffer, the 

20 formamide concentmtion of the hybridization buffer, or the hybridization temperature, 
may be varied to identify conditions which allow the probe to hybridize specifically to 
complementary nucleic acids. 

If the sample contains the organism from which the nucleic acid was isolated, 
specific hybridization of the probe is then detected. Hybridization may be detected by 

25 labeling the probe with a detectable agent such as a radioactive isotope, a fluorescent dye 
or an enzyme capable of catalyzing the formation of a detectable product. 

Many methods for using the labeled probes to detect the presence of 
complementary nucleic acids in a sample are familiar to those sldlled in the art. These 
include Southern Blots, Northern Blots, colony hybridization procedures and dot blots. 

30 Protocols for each of these procedures are provided in Ausubel et al Current Protocols in 
Molecular Biology, John Wiley 503 Sons, Inc. (1997) and Sambrook et al. Molecular 
Cloning: A Laboratory Manual 2nd Ed., Cold Spring Harbor Laboratory Press (1989. 

Alternatively, more than one probe (at least one of which is capable of specifically 
hybridizing to any complementary sequences which are present ia the nucleic acid 
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sample), may be used in an amplification reaction to determine whether the sample 
contains an organism containing a nucleic acid sequence of the invention {e.g., an 
organism from which the nucleic acid was isolated). In one aspect, the probes comprise 
oUgonucleotides. In one aspect, the amplification reaction may comprise a PGR reaction. 
PGR protocols are described in Ausubel and Sambrook, supra. Alternatively, the 
amplification may comprise a ligase chain reaction, 3SR, or strand displacement reaction. 
(See Barany, F., 'The Ligase Ghain Reaction in a PGR World", PCR Methods and 
Applications 1:5-16, 1991; E. Fahy et al, "Self-sustained Sequence Replication (3SR): An 
Isothermal Transcription-based Amplification System Altemative to PCR", PCR Methods 
and Applications 1:25-33, 1991; and Walker G.T. et al, "Strand Displacement 
Amplification-an Isotheraial in vitro DNA AmpUfication Technique", Nucleic Acid 
Research 2Q:A(>9\'\696, 1992). In such procedures, the nucleic acids in the sample are 
contacted with the probes, the amplification reaction is perfonned and any resulting 
amplification product is detected. The amplification product may be detected by performing 
gel electrophoresis on the reaction products and staining the gel with an intercalator such as 
ethidium bromide. Alternatively, one or more of the probes may be labeled with a 
radioactive isotope and the presence of a radioactive ampUfication product may be detected 
by autoradiography after gel electrophoresis. 

Probes derived fi-om sequences near the ends of the sequences of the invention, 
may also be used in chromosome walking procedures to identify clones containing 
genomic sequences located adjacent to the sequences of the invention. Such methods 
allow the isolation of genes which encode additional proteins firom the host organism. 

lu one aspect, the isolated or recombinant nucleic acids of the invention, the 
sequences complementary thereto, or a fragment comprising at least 10, 15, 20, 25, 30, 
35, 40, 50, 75, 100, 150, 200, 300, 400, or 500 or more consecutive bases of one of the 
sequences of the invention, or tiie sequences complementary thereto are used as probes to 
identify and isolate related nucleic acids. In some aspects, the related nucleic acids may 
be cDNAs or genomic DNAs from organisms other tiian the one from which the nucleic 
acid was isolated. For example, the other organisms may be related organisms. In such 
procedures, a nucleic acid sample is contacted with the probe under conditions which 
permit the probe to specifically hybridize to related sequences. Hybridization of the 
probe to nucleic acids fi"om tiie related organism is then detected ushig any of the 
methods described above. 
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By varying the striiigency of the hybridization conditions used to identify nucleic 
acids, such as cDNAs or genomic DNAs, which hybridize to the detectable probe, nucleic 
acids having different levels of homology to the probe can be identified and isolated. 
Stringency may be varied by conducting the hybridization at varying temperatures below the 
5 melting temperatures of the probes. The melting temperature, Tm, is the temperature (under 
defined ionic strength and pH) at which 50% of the target sequence hybridizes to a perfectly 
complementary probe. Very stringent conditions are selected to be equal to or about 5°C 
lower than the Tm for a particular probe. The melting temperature of the prabe may be 
calculated using the following formulas: 
1 0 For probes between 1 4 and 70 nucleotides in length the melting temperature (Tm) is 

calculated using the fonnula: Tm=8L5+16.6(log [Na-f])+0.41(fiaction G+C)-(600/N) 
where N is the length of the probe. 

If the hybridization is carried out in a solution containing formamide, the melting 
temperature may be calculated using the equation: Tm=81.5+16.6(log 
15 [Na+])-K).41(fiaction G+C)-(0.63% formamideH600/N) where N is the length of the 

probe. 

Prehybridization may be caixied out in 6X SSQ 5X Denhardfs reagent, 0.5% SDS, 
lOOfig denatured fiagmented sahnon sperm DNA or 6X SSC, 5X Denhardfs reagent, 0.5% 
SDS, 100|xg denatured fi:agmented sahnon sperm DNA, 50% formamide. The formulas for 

20 SSC and Denhardfs solutions are listed in Sambrook et ah , supra. 

In one aspect, hybridization is conducted by adding the detectable probe to the 
prehybridization solutions Usted above. Where the probe comprises double stranded DNA, 
it is denatured before addition to the hybridization solution. In one aspect, the filter is 
contacted with the hybridization solution for a sufficient period of time to allow the probe to 

25 hybridize to cDNAs or genomic DNAs containing sequences complementary thereto or 
homologous thereto. For probes over 200 nucleotides in length, the hybridization may be 
carried out at 15-25°C below the Tn,. For shorter probes, such as oUgonucleotide probes, 
the hybridization may be conducted at 5-10°C below the Tm. In one aspect, for 
hybridizations in 6X SSC, the hybridization is conducted at approximately 68°C. Usually, 

30 for hybridizations in 50% formamide containing solutions, the hybridization is conducted at 
approximately 42°C. 
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Inhibiting Expression of Cellulase Enzymes 

The invention provides nucleic acids complementary to (e.g., antisense sequences 
to) the nucleic acids of the invention, e.g., cellulase enzyme-encoding nucleic acids, e.g., 
nucleic acids comprising antisense, siRNA, miRNA, ribozymes. Nucleic acids of the 
invention comprising antisense sequences can be capable of inhibiting the transport, 
splicing or transcription of cellulase enzyme-encoding genes. The inhibition can be 
effected through the targeting of genomic DNA or messenger RNA. The transcription or 
function of targeted nucleic acid can be inhibited, for example, by hybridization and/or 
cleavage. One exemplary set of inhibitors provided by the present invention includes 
oligonucleotides which are able to either bind cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme gene or message, in either 
case preventing or inhibiting the production or function of a cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme. The 
association can be through sequence specific hybridization. Another useful class of 
inhibitors includes oligonucleotides which cause inactivation or cleavage of cellulase, 
e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme 
message. The oligonucleotide can have enzyme activity which causes such cleavage, 
such as ribozymes. The oligonucleotide can be chemically modified or conjugated to an 
enzyme or composition capable of cleaving the complementary nucleic acid. A pool of 
many different such oligonucleotides can be screened for those with the desired activity. 
Thus, the invention provides various compositions for the inhibition of cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme expression 
on a nucleic acid and/or protein level, e.g., antisense, siRNA, miRNA and ribozymes 
comprising cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase enzyme sequences of the invention and the anti-cellulase, e.g., anti- 
endoglucanase, anti-cellobiohydrolase and/or anti-beta-glucosidase antibodies of the 
invention. 

Inhibition of cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or 
beta-glucosidase enzyme expression can have a variety of industrial applications. For 
example, inhibition of cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme expression can slow or prevent spoilage. In one aspect, 
use of compositions of the invention that inhibit the expression and/or activity of 
cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
enzymes, e.g., antibodies, antisense oligonucleotides, ribozymes, siRNA and miRNA are 
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used to slow or prevent spoilage. Thus, in one aspect, the invention provides methods 
and compositions comprising application onto a plant or plant product (e.g., a cereal, a 
grain, a fruit, seed, root, leaf, etc.) antibodies, antisense oligonucleotides, ribozymes, 
siRNA and miRNA of the invention to slow or prevent spoilage. These compositions also 

5 can be expressed by the plant (e.g., a transgenic plant) or another organism (e.g., a 
bacterium or other microorganism transformed with a cellulase, e.g., endoglucanase, 
ceDobiohydrolase, maimanase and/or beta-glucosidase enzyme gene of the invention). 

The compositions of the invention for the inhibition of cellulase, e.g., 
endoglucanase, cellobiohydrolase, mamianase and/or beta-glucosidase enzyme expression 

10 (e.g., antisense, iRNA, ribozymes, antibodies) can be used as pharmaceutical 

compositions, e.g., as anti-pathogen agents or m other therapies, e.g., as anti-microbials 
for, e.g., Salmonella, 

Antisense Oligonucleotides 

The invention provides antisense oligonucleotides capable of binding cellulase, 

15 e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme 
message which, in one aspect, can inhibit cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme activity by targeting 
mJRNA. Strategies for designing antisense oligonucleotides are well described in the 
scientific and patent literature, and the skilled artisan can design such cellulase, e.g., 

20 endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme 

oligonucleotides using the novel reagents of the invention. For example, gene walldng/ 
RNA mapping protocols to screen for effective antisense ohgonucleotides are well known 
in the art, see, e.g., Ho (2000) Methods Enzymol. 3 14; 168-183, describing an RNA 
mapping assay, which is based on standard molecular techniques to provide an easy and 

25 reliable method for potent antisense sequence selection. See also Smith (2000) Eur. J . 
Phann.Sci. 11:191-198. 

Naturally occurring nucleic acids are used as antisense oligonucleotides. The 
antisense ohgonucleotides can be of any length; for example, in alternative aspects, the 
antisense ohgonucleotides are between about 5 to 100, about 10 to 80, about 15 to 60, 

30 about 18 to 40. The optimal length can be determined by routme screening. The 
antisense ohgonucleotides can be present at any concentration. The optimal 
concentration can be determuaed by routine screening. A wide variety of synthetic, non- 
naturally occurring nucleotide and nucleic acid analogues are known which can address 
this potential problem. For example, peptide nucleic acids (FNAs) contaimng non-ionic 
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backbones, such as N-(2-aminoethyl) glycine units can be used. Antisense 
oligonucleotides having phosphorotliioate linkages can also be used, as described m WO 
97/03211; WO 96/39154; Mata (1997) Toxicol Appl Pharmacol 144:189-197; Antisense 
Therapeutics, ed. Agrawal (Humana Press, Totowa, NJ., 1996). Antisense 

5 oligonucleotides having synthetic DNA backbone analogues provided by the invention 
can also include phosphoro-dithioate, methylphosphonate, phosphoramidate, allcyl 
phosphotriester, sulfamate, 3'-tluoacetal, methylene(methyliniino), 3'-N-carbamate, and 
morpholino carbamate nucleic acids, as described above. 

Combinatorial chemistry methodology can be used to create vast numbers of 

10 oligonucleotides that can be rapidly screened for specific oligonucleotides that have 

appropriate bmding affmities and specificities toward any target, such as the sense and 
antisense cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase enzyme sequences of the invention (see, e.g., Gold (1995) J. of Biol. Chem. 
270:13581-13584). 

1 5 Inhibitory Ribozymes 

The invention provides ribozymes capable of binding cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme message. 
These ribozymes can inhibit cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme activity by, e.g., targeting mRNA. Strategies for 

20 designing ribozymes and selecting the cellulase, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase enzyme-specific antisense sequence for targeting are 
well described in the scientific and patent literature, and the skilled artisan can design 
such ribozymes using the novel reagents of the invention. Ribozymes act by binding to a 
target RNA through the target RNA binding portion of a ribozyme which is held in close 

25 proximity to an enzymatic portion of the RNA that cleaves die target RNA. Thus, the 
ribozyme recognizes and binds a target RNA through complementary base-pairing, and 
once bound to the correct site, acts enzymatically to cleave and mactivate the target RNA. 
Cleavage of a target RNA in such a manner will destroy its ability to direct synthesis of 
an encoded protein if the cleavage occurs in the codmg sequence. After a ribozyme has 

30 bound and cleaved its RNA target, it can be released firom that RNA to bind and cleave 
new targets repeatedly. 

In some circumstances, the enzymatic nature of a ribozyme can be advantageous 
over other technologies, such as antisense technology (where a nucleic acid molecule 
simply binds to a nucleic acid target to block its transcription, translation or association 



wo 2006/101584 



83 



PCT/US2006/002516 



564462014240/D2150-.2WO 

with another molecule) as the effective concentration of ribozyme necessary to effect a 
therapeutic treatment can be lower than that of an antisense ohgonucleotide. This 
potential advantage reflects the abiUty of the ribozyme to act enzymatically. Thus, a 
single ribo2yme molecule is able to cleave many molecules of target RNA. In one aspect, 
a ribozyme is a highly specific inhibitor, with the specificity of inhibition depending not 
only on the base pairing mechanism of binding, but also on the mechanism by which the 
molecule inhibits the expression of the RNA to which it binds. That is, the inhibition is 
caused by cleavage of the RNA target and so specificity is defined as the ratio of the rate 
of cleavage of the targeted RNA over the rate of cleavage of non-targeted RNA. This 
cleavage mechanism is dependent upon factors additional to those involved in base 
pairing. Thus, the specificity of action of a ribozyme can be greater than that of antisense 
oligonucleotide binding the same RNA site. 

The ribozyme of the invention, e.g., an enzymatic ribozyme RNA molecule, can 
be formed m a hammerhead motif, a haupin motif, as a hepatitis delta vuus motif, a 
group I intron motif and/or an RNaseP-like RNA in association with an RNA guide 
sequence. Examples of hammerhead motifs are described by, e.g., Rossi (1992) Aids 
Research and Human Retroviruses 8:183; hairpin motifs by Hampel (1989) Biochemistry 
28:4929, and Hampel (1990) Nuc. Acids Res. 18:299; the hepatitis delta virus motif by 
Perrotta (1992) Biochemistry 31:16; the RNaseP motif by Guerrier-Takada (1983) CeU 
35:849; and the group I intron by Cech U.S. Pat. No. 4,987,071. The recitation of these 
specific motifs is not intended to be Umiting. Those skilled in the art will recognize that a 
ribozyme of the invention, e.g., an enzymatic RNA molecule of this invention, can have a 
specific substrate binding site complementary to one or more of the target gene RNA 
regions. A ribozyme of the invention can have a nucleotide sequence within or 
surrounding that substrate binding site which imparts an RNA cleaving activity to the 
molecide. 

RNA interference (RNAi) 

In one aspect, the invention provides an RNA inhibitory molecule, a so-called 
"RNAi" molecule, comprising a cellulase, e.g., endoglucanase, cellobiohydrolase, 
marmanase and/or beta-glucosidase enzyme sequence of the invention. The RNAi 
molecule can comprise a double-stranded RNA (dsRNA) molecule, e.g., siRNA and/or 
miRNA. The RNAi molecule, e.g., siRNA and/or miRNA, can inhibit expression of a 
cellulase, e.g., endoglucanase, cellobiohydrolase, maimanase and/or beta-glucosidase 
enzyme gene. In one aspect, the RNAi molecule, e.g., siRNA and/or miRNA, is about 15, 
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16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or more duplex nucleotides in length. While the 
invention is not limited by any particular mechanism of action, the RNAi can enter a cell 
and cause the degradation of a single-stranded RNA (ssRNA) of similar or identical 
sequences, including endogenous mRNAs. When a cell is exposed to double-stranded 

5 RNA (dsRNA), mRNA from the homologous gene is selectively degraded by a process 
called RNA interference (RNAi). A possible basic mechanism behind RNAi is the 
breaking of a double-stranded RNA (dsRNA) matching a specific gene sequence into 
short pieces called short mterfering RNA, which trigger the degradation of mRNA that 
matches its sequence. In one aspect, the RNAi's of the invention are used in gene- 

10 silencing therapeutics, see, e.g., Shuey (2002) Drug Discov. Today 7: 1040-1046. In one 
aspect, the invention provides methods to selectively degrade RNA usuig the RNAi's 
molecules, e.g., siRNA and/or miRNA, of the invention. The process may be practiced in 
vitf'o, ex vivo or in vivo. In one aspect, the RNAi molecules of the invention can be used 
to generate a loss-of-function mutation in a cell, an organ or an animal. Methods for 

15 making and using RNAi molecules, e.g., siRNA and/or miRNA, for selectively degrade 
RNA are well known in the art, see, e.g., U.S. Patent No. 6,506,559; 6,511,824; 
6,515,109; 6,489,127. 

Modification of Nucleic Acids - Making Variant Enzvmes of the Invention 

The invention provides methods of generating variants of the nucleic acids of the 

20 invention, e.g., those encoding a cellulase, e.g., endoglucanase, cellobiohydrolase, 

maimanase and/or beta-glucosidase enzyme. These methods can be repeated or used in 
various combinations to generate cellulase, e.g., endoglucanase, cellobiohydrolase, 
maimanase and/or beta-glucosidase enzymes having an altered or different activity or an 
altered or different stability from that of a cellulase, e.g., endoglucanase, 

25 cellobiohydrolase, mannanase and/or beta-glucosidase enzyme encoded by the template 
nucleic acid. These methods also can be repeated or used in various combinations, e.g., 
to generate variations in gene/ message expression, message translation or message 
stability. In another aspect, the genetic composition of a cell is altered by, e.g., 
modification of a homologous gene ex vivo, followed by its reinsertion into the cell. 

30 For example, in one aspect, the iavention provides isolated or recombinant nucleic 

acids having a sequence comprising at least one nucleotide base residue modification of 
SEQ ID NO: 163, wherein the modification comprises one or more of the following 
changes: a nucleotide at any one of positions 265 to 267 is modified to CGT, CGC, CGA, 
COG, AGA or AGG; a nucleotide at any one of positions 307 to 309 is modified to GGT, 
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GGC, GGA or GGG; a nucleotide at any one of positions 328 to 330 is modified to GGT, 
GGC, GGA or GGG; a nucleotide at any one of positions 340 to 342 is modified to TTA, 
TTG, CTT, CTC, CTA or CTG; a nucleotide at any one of positions 469 to 471 is 
modified to TCT, TCC, TCA, TCG, AGT or AGC; a nucleotide at any one of positions 

5 1441 to 1443 is modified to TTT or TTC; a nucleotide at any one of positions 1648 to 

1650 is modified to AAT or AAC; or, a nucleotide at any one of positions 1768 to 1770 is 
modified to GGT, CGC, GGA, GGG, AGA or AGG, In another aspect, the invention 
provides isolated or recombinant polypeptides having a sequence comprising at least one 
amino acid residue modification of SEQ ID NO: 164, wherein the modification comprises 

10 one or more of the following changes: a methionine at amino acid position 89 is modified 
to arginine; a phenylalanine at amino acid position 103 is modified to glycine; a proUne at 
amino acid position 1 10 is modified to glycine; a tyrosine at amino acid position 1 14 is 
modified to leucine; an alanine at amino acid position 157 is modified to serine; a 
tryptophan at amino acid position 481 is modified to phenylalanine; a proline at amino 

15 acid position 550 is modified to asparagine; or a glycine at amino acid position 590 is 
modified to arginine. 

In another aspect, the invention provides isolated or recombinant nucleic acids 
having a sequence comprismg a nucleotide residue sequence modification of an 
exemplary sequence of the invention (e.g., SEQ ID N0:1, SEQ ID NO:3, SEQ ID N0:5, 

20 SEQ ID NO:7, SEQ ID NO:9, SEQ ID NO: 11, etc.) wherem the modification comprises 
one or more of the following changes: a nucleotide at the equivalent of any one of 
positions 265 to 267 of SEQ ID NO: 163 are changed to CGT, CGC, CGA, CGG, AGA or 
AGG; a nucleotide at the equivalent of any one of positions 307 to 309 of SEQ ID 
NO: 1 63 are changed to GGT, GGC, GGA or GGG; a nucleotide at the equivalent of any 

25 one of positions 328 to 330 of SEQ ID NO:163 are changed to GGT, GGC, GGA or 
GGG; a nucleotide at the equivalent of any one of positions 340 to 342 of SEQ ID 
NO: 163 are changed to TTA, TTG, CTT, CTC, CTA or CTG; a nucleotide at the 
equivalent of any one of positions 469 to 471 of SEQ ID NO:163 are changed to TCT, 
TCC, TCA, TCG, AGT or AGC; a nucleotide at tiie equivalent of positions 1441 to 1443 

30 of SEQ ID NO: 1 63 are changed to TTT or TTC; a nucleotide at the equivalent of any one 
of positions 1648 to 1650 of SEQ ID NO: 163 are changed to AAT or AAC; or a 
nucleotide at the equivalent of any one of positions 1768 to 1770 of SEQ ID NO:163 are 
changed to CGT, CGC, GGA, CGG, AGA or AGG. In another aspect, the invention 
provides isolated or recombinant nucleic acids having a sequence comprising a nucleotide 
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residue sequence modification of any nucleic acid of the invention, wherein the 
modification comprises one or more of the following changes: a nucleotide at the 
equivalent of any one of positions 265 to 267 of SEQ ID NO: 163 are changed to CGT, 
CGC, CGA, COG, AGA or AGO; a nucleotide at die equivalent of any one of positions 

5 307 to 309 of SEQ ID NO: 163 are changed to GGT, GGC, GGA or GGG; a nucleotide at 
the equivalent of any one of positions 328 to 330 of SEQ ID NO: 163 are changed to 
GGT, GGC, GGA or GGG; a nucleotide at the equivalent of any one of positions 340 to 
342 of SEQ ID NO: 163 are changed to TTA, TTG, CTT, CTC, OTA or CTG; a 
nucleotide at the equivalent of any one of positions 469 to 471 of SEQ IDNO:163 are 

10 changed to TOT, TCC, TCA, TCG, AGT or AGC; a nucleotide at the equivalent of 

positions 1441 to 1443 of SEQ ID NO: 163 are changed to TTT or TTC; a nucleotide at 
the equivalent of any one of positions 1648 to 1650 of SEQ ID NO: 163 are changed to 
AAT or AAC; or, a nucleotide at the equivalent of any one of positions 1768 to 1770 of 
SEQ ID NO: 163 are changed to CGT, CGC, CGA, CGG, AGA or AGG. 

15 In another aspect, the invention provides isolated or recombinant polypeptides 

having a sequence comprising an amino acid residue modification of an exemplary 
sequence of the invention (e.g., SEQ ID N0:2, SEQ ID NO:4, SEQ ID NO:6, SEQ ID 
NO:8, SEQ ID NO: 10, etc.) wherein the modification comprises one or more of the 
following changes: an amino acid at the equivalent of the methionine at amino acid 

20 position 89 of SEQ ID NO: 164 is changed to an arginine; an amino acid at the equivalent 
of the phenylalanine at amino acid position 103 of SEQ ID NO:164 is changed to a 
glycine; an amino acid at the equivalent of the proline at amino acid position 1 10 of SEQ 
ID NO: 164 is changed to a glycine; an amino acid at the equivalent of the tyrosine at 
amino acid position 1 14 of SEQ ID NO: 164 is changed to a leucine; an amino acid at the 

25 equivalent of the alanine at amino acid position 157 of SEQ ID NO:164 is changed to a 
serine; an amino acid at the equivalent of the tryptophan at amino acid position 481 of 
SEQ ED NO: 164 is changed to a phenylalanine; an amino acid at the equivalent of the 
proline at amino acid position 550 of SEQ ID NO: 1 64 is changed to an asparagine; or an 
amino acid at the equivalent of the glycine at amino acid position 590 of SEQ ID NO: 164 

30 is changed to an arginine. 

In another aspect, the invention provides isolated or recombinant polypeptides 
having a sequence comprising an amino acid residue modification of any polypeptide of 
the invention, wherein the modification comprises one or more of the following changes; 
an amino acid at the equivalent of the methionine at amino acid position 89 of SEQ ID 
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NO: 164 is changed to an arginine; an amino acid at the equivalent of the phenylalanine at 
amino acid position 103 of SEQ ID NO: 164 is changed to a glycine; an amino acid at the 
equivalent of the proline at amino acid position 11 0 of SEQ ID NO : 1 64 is changed to a 
glycine; an amino acid at the equivalent of the tyrosine at amino acid position 1 14 of SEQ 
ED NO: 164 is changed to a leucine; an amino acid at the equivalent of the alanine at 
amino acid position 1 57 of SEQ ID NO: 1 64 is changed to a serine; an amino acid at the 
equivalent of the tryptophan at amino acid position 48 1 of SEQ ID NO: 164 is changed to 
a phenylalanine; an amino acid at the equivalent of the proline at amino acid position 550 
of SEQ ID NO: 1 64 is changed to an asparagine; or an amino acid at the equivalent of the 
glycine at amino acid position 590 of SEQ ID NO: 164 is changed to an arginine. 

A nucleic acid of the invention can be altered by any means. For example, 
random or stochastic methods, or, non-stochastic, or "directed evolution," methods, see, 
e.g., U.S. Patent No. 6,361,974. Methods for random mutation of genes are well known 
in the art, see, e.g., U.S. Patent No. 5,830,696. For example, mutagens can be used to 
randomly mutate a gene. Mutagens include, e.g., ultraviolet light or gamma irradiation, 
or a chemical mutagen, e.g., mitomycin, nitrous acid, photoactivated psoralens, alone or 
in combination, to induce DNA breaks amenable to repair by recombination. Other 
chemical mutagens include, for example, sodium bisulfite, nitrous acid, hydroxylamine, 
hydrazine or formic acid. Other mutagens are analogues of nucleotide precursors, e.g., 
nitrosoguanidine, 5-bromouracil, 2-aminopurine, or acridine. These agents can be added 
to a PGR reaction in place of the nucleotide precursor thereby mutating the sequence. 
Intercalating agents such as proflavine, acriflavine, quinacrine and the like can also be 
used. 

Any technique in molecular biology can be used, e.g., random PGR mutagenesis, 
see, e.g., Rice (1992) Proc. Natl. Acad. Sci. USA 89:5467-5471; or, combinatorial 
multiple cassette mutagenesis, see, e.g., Grameri (1995) Biotechniques 18:194-196. 
Alternatively, nucleic acids, e.g., genes, can be reassembled after random, or "stochastic," 
fragmentation, see, e.g., U.S. Patent Nos. 6,291,242; 6,287,862; 6,287,861; 5,955,358; 
5,830,721; 5,824,514; 5,811,238; 5,605,793. In alternative aspects, modifications, 
additions or deletions are introduced by error-prone PGR, shuffling, oligonucleotide- 
directed mutagenesis, assembly PGR, sexual PGR mutagenesis, in vivo mutagenesis, 
cassette mutagenesis, recxirsive ensemble mutagenesis, exponential ensemble 
mutagenesis, site-specific mutagenesis, gene reassembly. Gene Site Saturation 
Mutagenesis (GSSM), synthetic Ugation reassembly (SLR), recombmation, recursive 
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sequence recombination, phosphothioate-modified DNA mutagenesis, uracil-containing 
template mutagenesis, gapped duplex mutagenesis, point mismatch repair mutagenesis, 
repair-deficient host strain mutagenesis, chemical mutagenesis, radiogenic mutagenesis, 
deletion mutagenesis, restriction-selection mutagenesis, restriction-purification 

5 mutagenesis, artificial gene synthesis, ensemble mutagenesis, chimeric nucleic acid 

multimer creation, Chromosomal Saturation Mutagenesis (CSM) and/or a combination of 
these and other methods. 

The following pubhcations describe a variety of recursive recombination 
procedures and/or methods which can be incorporated into the methods of the invention: 

1 0 Stemmer ( 1 999) "Molecular breeding of vhuses for targeting and other clinical 

properties" Tumor Targeting 4:1-4; Ness (1999) Nature Biotechnology 17:893-896; 
Chang (1999) "Evolution of a cytokine using DNA family shuffling" Nature 
Biotechnology 17:793-797; Mmshull (1999) *Trotem evolution by molecular breedmg" 
Current Opmion in Chemical Biology 3:284-290; Christians (1999) "Directed evolution 

1 5 of thymidine kinase for AZT phosphorylation using DNA family shuffling" Nature 

Biotechnology 17:259-264; Crameri (1998) "DNA shuffling of a family of genes firom 
diverse species accelerates directed evolution" Nature 391:288-291; Crameri (1997) 
"Molecular evolution of an arsenate detoxification pathway by DNA shuffling," Nature 
Biotechnology 15:436-438; Zhang (1997) "Directed evolution of an effective fucosidase 

20 firom a galactosidase by DNA shuffling and screening" Proc. Natl, Acad. Sci. USA 

94:4504-4509; Patten et al. (1997) "AppUcations of DNA Shuffling to Pharmaceuticals 
and Vaccines" Current Opinion in Biotechnology 8:724-733; Crameri et al. (1996) 
"Construction and evolution of antibody-phage libraries by DNA shuffling" Nature 
Medicine 2:100-103; Gates et al. (1996) "Affinity selective isolation of ligands from 

25 peptide libraries through display on a lac repressor 'headpiece dimer'" Journal of 

Molecular Biology 255:373-386; Stemmer (1996) "Sexual PCR and Assembly PCR" In: 
The Encyclopedia of Molecular Biology. VCH Publishers, New York, pp.447-457; 
Crameri and Stemmer (1995) "Combmatorial multiple cassette mutagenesis creates all the 
permutations of mutant and wildtype cassettes" BioTechniques 1 8: 194-195; Stemmer et 

30 al, (1995) "Single-step assembly of a gene and entire plasmid form large numbers of 
oligodeoxyribonucleotides" Gene, 164:49-53; Stemmer (1995) "The Evolution of 
Molecular Computation" Science 270: 1510; Stemmer (1995) "Searchmg Sequence 
Space" Bio/Technology 13:549-553; Stemmer (1994) "Rapid evolution of a protein in 
vitro by DNA shuffling" Nature 370:389-391; and Stemmer (1994) "DNA shufflmg by 
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random fragmentation and reassembly: In vitro recombination for molecular evolution." 
Proc. Natl. Acad. Sci. USA 91:10747-10751. 

Mutational methods of generating diversity include, for example, site-directed 
mutagenesis (Ling et al. (1997) "Approaches to DNA mutagenesis: an overview" Anal 
Biochem. 254(2): 157-178; Dale et al. (1996) "Oligonucleotide-directed random 
mutagenesis using the phosphorothioate method" Methods Mol. Biol. 57:369-374; Smith 

(1985) "In vitro mutagenesis" Ann. Rev. Genet. 19:423-462; Botstein & Shortle (1985) 
"Strategies and applications of in vitro mutagenesis" Science 229:1 193-1201; Carter 

(1986) "Site-directed mutagenesis" Biochem. J. 237:1-7; andKunkel (1987) "The 
efficiency of oligonucleotide directed mutagenesis" in Nucleic Acids & Molecular 
Biology (Eckstein, F. and Lilley, D. M. J. eds., Springer Verlag, Berlin)); mutagenesis 
using uracil containing templates (Kunkel (1985) '.'Rapid and efficient site-specific 
mutagenesis without phenotypic selection" Proc. Natl Acad. Sci. USA 82:488-492; 
Kunkel et al. (1987) "Rapid and efficient site-specific mutagenesis without phenotypic 
selection" Methods in Enzymol. 154, 367-382; and Bass et al. (1988) "Mutant Trp 
repressors with new DNA-binding specificities" Science 242:240-245); oligonucleotide- 
directed mutagenesis (Methods in Enzymol. 100: 468-500 (1983); Methods in Enzymol. 
154: 329-350 (1987); ZoUer (1982) "Oligonucleotide-directed mutagenesis using M13- 
derived vectors: an efficient and general procedure for the production of point mutations 
in any DNA fragment" Nucleic Acids Res. 10:6487-6500; ZoUer & Smith (1983) 
"Oligonucleotide-directed mutagenesis of DNA fragments cloned into Ml 3 vectors" 
Methods in Enzymol 100:468-500; and ZoUer (1987) Oligonucleotide-directed 
mutagenesis: a simple method using two oligonucleotide primers and a single-stranded 
DNA template" Methods in Enzymol. 154:329-350); phosphorothioate-modified DNA 
mutagenesis (Taylor (1985) "The use of phosphorothioate-modified DNA in restriction 
enzyme reactions to prepare nicked DNA" NucL Acids Res. 13: 8749-8764; Taylor 
(1985) "The rapid generation of oUgonucleotide-directed mutations at high frequency 
using phosphorothioate-modified DNA" NucL Acids Res. 13: 8765-8787 (1985); 
Nakamaye (1986) "Inhibition of restriction endonuclease Nci I cleavage by 
phosphorothioate groups and its application to oligonucleotide-directed mutagenesis" 
NucL Acids Res. 14: 9679-9698; Sayers (1988) " Y-T Exonucleases in phosphorothioate- 
based oligonucleotide-directed mutagenesis" NucL Acids Res. 16:791-802; and Sayers et 
al. (1988) "Strand specific cleavage of phosphorothioate-containing DNA by reaction 
with restriction endonucleases in the presence of ethidium bromide" Nucl. Acids Res. 16: 
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803-814); mutagenesis using gapped duplex DNA (Kramer et al. (1984) "The gapped 
duplex DNA approach to oligonucleotide-directed mutation construction" Nucl. Acids 
Res. 12: 9441-9456; Kramer & Fritz (1987) Methods in Enzymol. "Oligonucleotide- 
directed construction of mutations via gapped duplex DNA" 154:350-367; Kramer (1988) 

5 "Improved enzymatic in vitro reactions in the gapped duplex DNA approach to 

oligonucleotide-directed construction of mutations" Nucl. Acids Res. 16: 7207; and Fritz 
(1988) "Oligonucleotide-directed construction of mutations: a gapped duplex DNA 
procedxire without enzymatic reactions in vitro" Nucl. Acids Res. 16: 6987-6999). 
Additional protocols that can be used to practice the invention include point 

10 mismatch repair (Kramer (1984) "Point Mismatch Repair" Cell 38:879-887), mutagenesis 
usmg repair-deficient host strains (Carter et al (1985) "Improved oligonucleotide site- 
directed mutagenesis using M13 vectors" Nucl. Acids Res. 13: 4431-4443; and Carter 
(1987) "Improved oligonucleotide-directed mutagenesis using M13 vectors" Methods in 
Enzymol. 154: 382-403), deletion mutagenesis (Eghtedaizadeh (1986) "Use of 

15 oligonucleotides to generate large deletions" Nucl. Acids Res."14: 5115), restriction- 
selection and restriction-selection and restriction-purification (Wells et al. (1986) 
"Importance of hydrogen-bond formation in stabiUzing the transition state of subtiUsin" 
Phil. Trans. R. Soc. Lond. A 317: 415-423), mutagenesis by total gene synthesis 
(Nambiar et al (1984) "Total synthesis and cloning of a gene coding for the ribonuclease 

20 S protein" Science 223: 1299-1301; Sakamar and Khorana (1988) "Total synthesis and 
expression of a gene for the a-subunit of bovine rod outer segment guanine nucleotide- 
binding protein (ti*ansducin)" Nucl Acids Res. 14: 6361-6372; Wells et al (1985) 
"Cassette mutagenesis: an efficient method for generation of multiple mutations at 
defined sites" Gene 34:315-323; and Grundstrom et al. (1985) "OUgonucleotide-directed 

25 mutagenesis by microscale 'shot-gun' gene synthesis" Nucl. Acids Res. 13: 3305-3316), 
double-strand break repair (Mandecki (1986); Arnold (1993) "Protein engineering for 
unusual environments" Current Opinion in Biotechnology 4:450-455. "OUgonucleotide- 
directed double-strand break repair in plasmids of Escherichia coli: a method for site- 
specific mutagenesis" Proc. Natl. Acad. Sci. USA, 83:7177-7181). Additional details on 

30 many of the above methods can be foimd in Methods in Enzymology Volume 1 54, which 
also describes usefiil controls for trouble-shooting problems with various mutagenesis 
methods. 

Protocols that can be used to practice the invention are described, e.g., in U.S. 
Patent Nos. 5,605,793 to Stemmer (Feb. 25, 1997), "Methods for In Vitro 
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Recombination;" U.S. Pat. No. 5,81 1,238 to Stemmer et al. (Sep. 22, 1998) "Methods for 
Generating Polynucleotides having Desired Characteristics by Iterative Selection and 
Recombination;" U.S. Pat. No. 5,830,721 to Stemmer etal. (Nov, 3, 1998), "DNA 
Mutagenesis by Random Fragmentation and Reassembly;" U.S. Pat. No. 5,834,252 to 

5 Stemmer, et al (Nov. 10, 1998) "End-Complementary Polymerase Reaction;" U.S. Pat 
No. 5,837,458 to Minshull, et al. (Nov. 17, 1998), "Methods and Compositions for 
Cellular and Metabolic Engineering;" WO 95/22625, Stenomer and Crameri, 
"Mutagenesis by Random Fragmentation and Reassembly;" WO 96/33207 by Stemmer 
and Lipschutz "End Complementary Polymerase Chain Reaction;" WO 97/20078 by 

10 Stemmer and Crameri "Methods for Generating Polynucleotides having Desired 

Characteristics by Iterative Selection and Recombination;" WO 97/35966 by Minshull 
and Stemmer, "Methods and Compositions for Cellular and MetaboUc Engineering;" WO 
99/41402 by Punnonen et al. "Targeting of Genetic Vaccine Vectors;" WO 99/41383 by 
Punnonen et al. "Antigen Library Immunization;" WO 99/41369 by Punnonen et al. 

15 "Genetic Vaccine Vector Engineering;" WO 99/41368 by Punnonen et al. "Optimization 
of Immunomodulatory Properties of Genetic Vaccines;" EP 752008 by Stenmier and 
Crameri, "DNA Mutagenesis by Random Fragmentation and Reassembly;" EP 0932670 
by Stemmer "Evolving Cellular DNA Uptake by Recursive Sequence Recombination;" 
WO 99/23107 by Stemmer et al., "Modification of Virus Tropism and Host Range by 

20 Viral Genome Shuffling;" WO 99/21979 by Apt et al., "Human PapiUomavkus Vectors;" 
WO 98/31837 by del Cardayre et al "Evolution of Whole Cells and Organisms by 
Recursive Sequence Recombination;" WO 98/27230 by Patten and Stenuner, "Methods 
and Compositions for Polypeptide Engineering;" WO 98/27230 by Stemmer et al, 
"Methods for Optimization of Gene Therapy by Recvirsive Sequence Shuffling and 

25 Selection," WO 00/00632, "Methods for Generating Highly Diverse Libraries," WO 

00/09679, "Methods for Obtaining in Vitro Recombined Polynucleotide Sequence Banks, 
and Resulting Sequences," WO 98/42832 by Arnold et al., "Recombination of 
Polynucleotide Sequences Using Random or Defined Primers," WO 99/29902 by Arnold 
et al, "Method for Creating Polynucleotide and Polypeptide Sequences," WO 98/41653 

30 by Vind, "An in Vitro Method for Construction of a DNA Library," WO 98/41622 by 
Borchert et al, "Method for Constructing a Library Using DNA Shuffling," and WO 
98/42727 by Pati and Zarling, "Sequence Alterations using Homologous Recombination." 

Protocols that can be used to practice the invention (providing details regarding 
various diversity generating methods) are described, e.g., in U.S. Patent application serial 
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no. (USSN) 09/407,800, "SHUFFLING OF CODON ALTERED GENES" by Patten et 
al. filed Sep. 28, 1999; "EVOLUTION OF WHOLE CELLS AND ORGANISMS BY 
RECURSIVE SEQUENCE RECOMBESIATION" by del Cardayre et al, United States 
Patent No. 6,379,964; "OLIGONUCLEOTIDE MEDIATED NUCLEIC ACID 

5 RECOMBINATION" by Crameri et al, United States Patent Nos. 6,319,714; 6,368,861; 
6,376,246; 6,423,542; 6,426,224 and PCT/USOO/01203; "USE OF CODON-VARIED 
OLIGONUCLEOTIDE SYNTHESIS FOR SYNTHETIC SHUFFLING" by Welch et al„ 
United States Patent No. 6,436,675; "METHODS FOR MAKING CHARACTER 
STRINGS, POLYNUCLEOTIDES & POLYPEPTIDES HAVING DESIRED 

10 CHARACTERISTICS" by Selifonov et al., filed Jan. 18, 2000, (PCT/USOO/01202) and, 
e.g. "METHODS FOR MAKING CHARACTER STRINGS, POLYNUCLEOTIDES & 
POLYPEPTIDES HAVING DESIRED CHARACTERISTICS" by Selifonov et al., filed 
Jul, 18, 2000 (U.S. Ser. No. 09/618,579);'"METHODS OF POPULATING DATA 
STRUCTURES FOR USE JN EVOLUTIONARY SIMULATIONS" by Selifonov and 

15 Stemmer, filed Jan. 18, 2000 (PCT/USOO/01 138); and "SINGLE-STRANDED NUCLEIC 
ACID TEMPLATE-MEDL^TED RECOMBINATION AND NUCLEIC ACID 
FRAGMENT ISOLATION" by AfEholter, filed Sep. 6, 2000 (U.S. Ser. No. 09/656,549); 
and United States Patent Nos. 6,177,263; 6,153,410. 

Non-stochastic, or "directed evolution," methods include, e.g., saturation 

20 mutagenesis, such as Gene Site Saturation Mutagenesis (GSSM), synthetic ligation 

reassembly (SLR), or a combination thereof are used to modify the nucleic acids of the 
invention to generate cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or 
beta-glucosidase enzymes with new or altered properties (e.g., activity under highly 
acidic or alkaline conditions, high or low temperatures, and the like). Polypeptides 

25 encoded by the modified nucleic acids can be screened for an activity before testing for 
glucan hydrolysis or other activity. Any testing modality or protocol can be used, e.g., 
using a capillary array platform. See, e.g., U.S. Patent Nos. 6,361,974; 6,280,926; 
5,939,250. 

Gene Site Saturation mutagenesis, or, GSSM 
30 The invention also provides methods for making enzyme using Gene Site 

Saturation mutagenesis, or, GSSM, as described herein, and also in U.S. Patent Nos. 
6,171,820 and 6,579,258. In one aspect, codon primers containing a degenerate N;N,G/T 
sequence are used to introduce point mutations into a polynucleotide, e.g., a cellulase, 
e.g., endoglucanase, cellobiohydrolase, maimanase and/or beta-glucosidase enzyme or an 
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antibody of the invention, so as to generate a set of progeny polypeptides in which a full 
range of smgle amino acid substitutions is represented at each amino acid position, e.g., 
an amino acid residue in an enzyme active site or ligand binding site targeted to be 
modified. These oligonucleotides can comprise a contiguous first homologous sequence, 

5 a degenerate N,N,G/T sequence, and, optionally, a second homologous sequence. The 
downstream progeny translational products from the use of such oligonucleotides include 
all possible amino acid changes at each amino acid site along the polypeptide, because the 
degeneracy of the N,N,G/T sequence includes codons for all 20 amino acids. In one 
aspect, one such degenerate oUgonucleotide (comprised of, e.g., one degenerate N,N,G/T 

10 cassette) is used for subjecting each original codon in a parental polynucleotide template 
to a full range of codon substitutions. In another aspect, at least two degenerate cassettes 
are used - either m the same oHgonucleotide or not, for subjecting at least two original 
codons in a parental polynucleotide template to a full range of codon substitutions. For 
example, more than one N,N,G/T sequence can be contained in one oligonucleotide to 

15 introduce amino acid mutations at more than one site. This plurality of N,N,G/r 

sequences can be directly contiguous, or separated by one or more additional nucleotide 
sequence(s). In another aspect, oligonucleotides serviceable for introducing additions and 
deletions can be used either alone or in combination with the codons containing an 
N,N,G/T sequence, to introduce any combination or permutation of amino acid additions, 

20 deletions, and/or substitutions. 

In one aspect, simultaneous mutagenesis of two or more contiguous amino acid 
positions is done using an oligonucleotide that contains contiguous N,N,G/rr triplets, i.e. a 
degenerate (N,N,G/T)n sequence. In another aspect, degenerate cassettes having less 
degeneracy than the N,N,G/T sequence are used. For example, it may be desirable in 

25 some instances to use (e.g. in an oligonucleotide) a degenerate triplet sequence comprised 
of only one N, where said N can be in the first second or third position of the triplet. Any 
other bases including any combinations and permutations thereof can be used in the 
remaining two positions of the triplet. Alternatively, it may be desirable in some 
instances to use (e.g. ra an oligo) a degenerate N,N,N triplet sequence. 

30 In one aspect, use of degenerate triplets (e.g., N,N,G/T triplets) allows for 

systematic and easy generation of a full range of possible natural amino acids (for a total 
of 20 amino acids) into each and every amino acid position in a polypeptide (in 
alternative aspects, the methods also iaclude generation of less than all possible 
substitutions per amino acid residue, or codon, position). For example, for a 100 amino 
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acid polypeptide, 2000 distinct species (i.e. 20 possible amino acids per position X 100 
amino acid positions) can be generated. Through the use of an oligonucleotide or set of 
oligonucleotides containing a degenerate N,N,G/T triplet, 32 individual sequences can 
code for all 20 possible natural aniino acids. Thus, in a reaction vessel in which a 

5 parental polynucleotide sequence is subjected to saturation mutagenesis using at least one 
such oligonucleotide, there are generated 32 distinct progeny polynucleotides encoding 
20 distinct polypeptides. In contrast, the use of a non-degenerate oligonucleotide in site- 
directed mutagenesis leads to only one progeny polypeptide product per reaction vessel. 
Nondegenerate oligonucleotides can optionally be used in combination with degenerate 

10 primers disclosed; for example, nondegenerate oligonucleotides can be used to generate 
specific point mutations in a working polynucleotide. This provides one means to 
generate specific silent point mutations, point mutations leading to corresponding amino 
acid changes, and point mutations that cause the generation of stop codons and the 
corresponding expression of polypeptide fragments. 

15 In one aspect, each saturation mutagenesis reaction vessel contains 

polynucleotides encoding at least 20 progeny polypeptide (e.g., cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzymes) 
molecules such that all 20 natural amino acids are represented at the one specific amino 
acid position corresponding to the codon position mutagenized in the parental 

20 polynucleotide (other aspects use less than all 20 natural combinations). The 32-fold 
degenerate progeny polypeptides generated from each saturation mutagenesis reaction 
vessel can be subjected to clonal amplification (e.g. cloned into a suitable host, e.g., E, 
coli host, using, e.g., an expression vector) and subjected to expression screening. When 
an individual progeny polypeptide is identified by screening to display a favorable change 

25 in property (when compared to the parental polypeptide, such as increased glucan 

hydrolysis activity under alkaline or acidic conditions), it can be sequenced to identify the 
correspondingly favorable amino acid substitution contained therein. 

In one aspect, upon mutagenizing each and every amino acid position in a parental 
polypeptide using saturation mutagenesis as disclosed herein, favorable aniino acid 

30 changes may be identified at more than one amino acid position. One or more new 
progeny molecules can be generated that contain a combination of all or part of these 
favorable amino acid substitutions. For example, if 2 specific favorable amino acid 
changes are identified in each of 3 amino acid positions in a polypeptide, the 
permutations include 3 possibilities at each position (no change from the original amino 
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acid, and each of two favorable changes) and 3 positions. Thus, there are 3 x 3 x 3 or 27 
total possibilities, including 7 that were previously examined - 6 single point mutations 
(i.e. 2 at each of three positions) and no change at any position. 

In yet another aspect, site-saturation mutagenesis can be used together with 
shuffling, chimerization, recombination and other mutagenizmg processes, along with 
screemng. This invention provides for the use of any mutagenizing process(es), mcluding 
saturation mutagenesis, in an iterative manner, hi one exemplification, the iterative use of 
any mutagenizing process(es) is used in combination with screening. 

The invention also provides for the use of proprietary codon primers (containing a 
degenerate N,N,N sequence) to introduce point mutations into a polynucleotide, so as to 
generate a set of progeny polypeptides in which a full range of smgle amino acid 
substitutions is represented at each amino acid position (Gene Site Saturation 
Mutagenesis (GSSM)). The oligos used are comprised contiguously of a first 
homologous sequence, a degenerate N,N,N sequence and in one aspect but not 
necessarily a second homologous sequence. The downstream progeny translational 
products fi-om the use of such oligos include all possible ammo acid changes at each 
ammo acid site along the polypeptide, because the degeneracy of the N,N,N sequence 
includes codons for all 20 amino acids. 

In one aspect, one such degenerate oUgo (comprised of one degenerate N,N,N 
cassette) is used for subjecting each original codon in a parental polynucleotide template 
to a full range of codon substitutions. In another aspect, at least two degenerate N,N,N 
cassettes are used - either in the same oUgo or not, for subjecting at least two origmal 
codons in a parental polynucleotide template to a full range of codon substitutions. Thus, 
more than one N,N,N sequence can be contained in one oUgo to mtroduce amino acid 
mutations at more than one site. This plurality of N,N,N sequences can be directiy 
contiguous, or separated by one or more additional nucleotide sequence(s). In another 
aspect, oUgos serviceable for mtroducing additions and deletions can be used either alone 
or m combmation with the codons containing an N,N,N sequence, to introduce any 
combination or permutation of amino acid additions, deletions and/or substitutions. 

In one aspect, it is possible to simultaneously mutagenize two or more contiguous 
amino acid positions using an oligo that contains contiguous N,NJS[ triplets, i.e. a 
degenerate (N,N,N)n sequence. In another aspect, the present invention provides for the 
use of degenerate cassettes having less degeneracy than the N,N,N sequence. For 
example, it may be desirable in some instances to use (e.g. in an oligo) a degenerate 
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triplet sequence comprised of only one N, where the N can be in the first second or third 
position of the triplet. Any other bases including any combinations and permutations 
thereof can be used in the remaining two positions of the triplet. Alternatively, it may be 
desirable in some instances to use (e.g., ui an oligo) a degenerate N,N,N triplet sequence, 

5 N,N,G/T, or an N,N, G/C triplet sequence. 

In one aspect, use of a degenerate triplet (such as N,N,G/T or an N,N, G/C triplet 
sequence) is advantageous for several reasons. In one aspect, tliis invention provides a 
means to systematically and fairly easily generate the substitution of the fiill range of 
possible amino acids (for a total of 20 amino acids) into each and every amino acid 

10 position in a polypeptide. Thus, for a 100 amino acid polypeptide, the invention provides 
a way to systematically and fairly easily generate 2000 distinct species (/.e., 20 possible 
amino acids per position times 100 amino acid positions). It is appreciated that there is 
provided, through the use of an oligo containing a degenerate N,N,G/T or an N,N, G/C 
triplet sequence, 32 individual sequences that code for 20 possible amino acids. Thus, in 

15 a reaction vessel in which a parental polynucleotide sequence is subjected to saturation 

mutagenesis using one such oligo, there are generated 32 distinct progeny polynucleotides 
encoding 20 distinct polypeptides. In contrast, the use of a non-degenerate oligo in site- 
directed mutagenesis leads to only one progeny polypeptide product per reaction vessel. 
This invention also provides for the use of nondegenerate oligos, which can 

20 optionally be used in combination with degenerate primers disclosed. It is appreciated 

that in some situations, it is advantageous to use nondegenerate oligos to generate specific 
point mutations in a working polynucleotide. This provides a means to generate specific 
silent point mutations, point mutations leading to corresponding amino acid changes and 
point mutations that cause the generation of stop codons and the corresponding 

25 expression of polypeptide fragments. 

Thus, in one aspect of this invention, each saturation mutagenesis reaction vessel 
contains polynucleotides encoding at least 20 progeny polypeptide molecules such that all 
20 amino acids are represented at the one specific amino acid position corresponding to 
the codon position mutagenized in the parental polynucleotide. The 32-fold degenerate 

30 progeny polypeptides generated from each saturation mutagenesis reaction vessel can be 
subjected to clonal amphfication (e.g., cloned into a suitable E, coli host usmg an 
expression vector) and subjected to expression screening. When an individual progeny 
polypeptide is identified by screening to display a favorable change in property (when 
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compared to the parental polypeptide), it can be sequenced to identify the 
correspondingly favorable amino acid substitution contained therein. 

In one aspect, upon mutagenizing each and every amino acid position in a parental 
polypeptide using saturation mutagenesis as disclosed herein, a favorable amino acid 

5 changes is identified at more than one amino acid position. One or more new progeny 
molecules can be generated that contain a combination of all or part of these favorable 
amino acid substitutions. For example, if 2 specific favorable amino acid changes are 
identified in each of 3 amino acid positions in a polypeptide, the permutations mclude 3 
possibilities at each position (no change from the original amino acid and each of two 

10 favorable changes) and 3 positions. Thus, there are 3 x 3 x 3 or 27 total possibilities, 
including 7 that were previously examined - 6 single point mutations 2 at each of 
three positions) and no change at any position. 

The invention provides for the use of saturation mutagenesis in combination with 
additional mutagenization processes, such as process where two or more related 

15 polynucleotides are introduced into a suitable host cell such that a hybrid polynucleotide 
is generated by recombination and reductive reassortment. 

In addition to performing mutagenesis along the entire sequence of a gene, the 
instant invention provides that mutagenesis can be use to replace each of any nuimber of 
bases in a polynucleotide sequence, wherein the number of bases to be mutagenized is in 

20 one aspect every mteger from 15 to 100,000. Thus, instead of mutagenizing every 

position along a molecule, one can subject every or a discrete number of bases (in one 
aspect a subset totaling from 15 to 100,000) to mutagenesis. In one aspect, a separate 
nucleotide is used for mutagenizing each position or group of positions along a 
polynucleotide sequence. A group of 3 positions to be mutagenized may be a codon. The 

25 mutations can be introduced using a mutagenic primer, containing a heterologous 

cassette, also referred to as a mutagenic cassette. Exemplaxy cassettes can have from 1 to 
500 bases. Each nucleotide position m such heterologous cassettes be N, A, C, G, T, 
A/C, A/G, A/T, C/G, C/T, G/T, C/G/T, A/G/T, A/C/T, A/C/G, or E, where E is any base 
that is not A, C, G, or T (E can be referred to as a designer ohgo). 

30 In one aspect, saturation mutagenesis is comprised of mutagenizing a complete set 

of mutagenic cassettes (wherein each cassette is in one aspect about 1-500 bases in 
length) in defined polynucleotide sequence to be mutagenized (wherein the sequence to 
be mutagenized is in one aspect from about 15 to 100,000 bases in length). Thus, a group 
of mutations (ranging fi-om 1 to 100 mutations) is introduced into each cassette to be 



wo 2006/101584 



98 



PCT/US2006/002516 



5644620 1 4240/D2 1 50-2 WO 

mutagenized. A grouping of mutations to be introduced into one cassette can be different 
or the same from a second grouping of mutations to be introduced into a second cassette 
during the application of one romd of saturation mutagenesis. Such groupings are 
exemplified by deletions, additions, groupings of particular codons and groupmgs of 

5 particular nucleotide cassettes. 

In one aspect, defined sequences to be mutagenized include a whole gene, 
pathway, cDNA, an entire open reading frame (ORF) and entire promoter, enhancer, 
repressor/transactivator, origin of replication, intron, operator, or any polynucleotide 
functional group. Generally, a "defined sequences" for this purpose may be any 

10 polynucleotide that a 15 base-polynucleotide sequence and polynncleotide sequences of 
lengths between 15 bases and 15,000 bases (this invention specifically names every 
integer in between). Considerations in choosing groupings of codons include types of 
amino acids encoded by a degenerate mutagenic cassette. 

In one aspect, a groupmg of mutations that can be introduced into a mutagenic 

15 cassette, this invention specifically provides for degenerate codon substitutions (using 

degenerate oligos) that code for 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 
and 20 amino acids at each position and a library of polypeptides encoded thereby. 

Synthetic Ligation Reassembly (SLR) 

The invention provides a non-stochastic gene modification system termed 
20 "synthetic ligation reassembly," or simply "SLR," a "directed evolution process," to 

generate polypeptides, e.g., cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzymes or antibodies of the invention, with new or altered 
properties. 

SLR is a method of ligating oligonucleotide fragments together non-stochastically. 

25 This metiiod differs from stochastic oligonucleotide shuffling in that the nucleic acid 
building blocks are not shuffled, concatenated or chimerized randomly, but rather are 
assembled non-stochastically. See, e.g., U.S, Patent Nos. 6,773,900; 6,740,506; 
6,713,282; 6,635,449; 6,605,449; 6,537,776. In one aspect, SLR comprises the following 
steps: (a) providing a template polynucleotide, wherein the template polynucleotide 

30 comprises sequence encoding a homologous gene; (b) providing a plurality of building 
block polynucleotides, wherein the building block polynucleotides are designed to cross- 
over reassemble with the template polynucleotide at a predetermined sequence, and a 
building block polynucleotide comprises a sequence that is a variant of the homologous 
gene and a sequence homologous to the template polynucleotide flanldng the variant 
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sequence; (c) combining a building block polynucleotide with a template polynucleotide 
such that the building block polynucleotide cross-over reassembles with the template 
polynucleotide to generate polynucleotides comprismg homologous gene sequence 
variations. 

SLR does not depend on the presence of high levels of homology between 
polynucleotides to be rearranged. Thus, this method can be used to non-stochastically 
generate libraries (or sets) of progeny molecules comprised of over 10^°^ different 
chimeras. SLR can be used to generate libraries comprised of over 10^°°° different 
progeny chimeras. Thus, aspects of the present invention include non-stochastic methods 
of producing a set of finalized chimeric nucleic acid molecule shaving an overall 
assembly order that is chosen by design. This method includes the steps of generating by 
design a plurality of specific nucleic acid building blocks having serviceable mutually 
compatible ligatable ends, and assembling these nucleic acid building blocks, such that a 
designed overall assembly order is achieved. 

The mutually compatible ligatable ends of the nucleic acid building blocks to be 
assembled are considered to be "serviceable" for this type of ordered assembly if they 
enable the building blocks to be coupled in predeteiToined orders. Thus, the overall 
assembly order in which the nucleic acid building blocks can be coupled is specified by 
the design of the hgatable ends. If more than one assembly step is to be used, then the 
overall assembly order in which the nucleic acid building blocks can be coupled is also 
specified by the sequential order of the assembly step(s). In one aspect, the annealed 
building pieces are treated with an enzyme, such as a ligase (e.g. T4 DNA ligase), to 
achieve covalent bonding of the building pieces. 

In one aspect, the design of the oligonucleotide building blocks is obtained by 
analyzing a set of progenitor nucleic acid sequence templates that serve as a basis for 
producing a progeny set of finahzed chimeric polynucleotides. These parental 
oUgonucIeotide templates thxis serve as a source of sequence information that aids in the 
design of the nucleic acid building blocks that are to be mutagenized, e.g., chimerized or 
shuffled. In one aspect of this method, the sequences of a plurality of parental nucleic 
acid templates are aligned in order to select one or more demarcation points. The 
demarcation points can be located at an area of homology, and are comprised of one or 
more nucleotides. These demarcation points are in one aspect shared by at least two of 
the progenitor templates. The demarcation points can thereby be used to delineate the 
boundaries of oligonucleotide building blocks to be generated in order to rearrange the 
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parental polynucleotides. The demarcation points identified and selected in the 
progenitor molecxiles serve as potential chimerization points in the assembly of the iHnal 
chimeric progeny molecules. A demarcation point can be an area of homology 
(comprised of at least one homologous nucleotide base) shared by at least two parental 
polynucleotide sequences. Alternatively, a demarcation point can be an area of homology 
that is shared by at least half of the parental polynucleotide sequences, or, it can be an 
area of homology that is shared by at least two thirds of the parental polynucleotide 
sequences. Even more in one aspect a serviceable demarcation points is an area of 
homology that is shared by at least three fourths of the parental polynucleotide sequences, 
or, it can be shared by at ahnost all of the parental polynucleotide sequences. In one 
aspect, a demarcation point is an area of homology that is shared by all of the parental 
polynucleotide sequences. 

In one aspect, a ligation reassembly process is performed exhaustively in order to 
generate an exhaustive library of progeny chimeric polynucleotides. In other words, all 
possible ordered combinations of the nucleic acid building blocks are represented in the 
set of finalized chimeric nucleic acid molecules. At the same time, in another aspect, the 
assembly order (i.e. the order of assembly of each building block in the 5' to 3 sequence 
of each finalized chimeric nucleic acid) in each combination is by design (or non- 
stochastic) as described above. Because of the non-stochastic nature of this invention, the 
possibility of unwanted side products is greatly reduced. 

In another aspect, the ligation reassembly method is performed systematically. 
For example, the method is performed in order to generate a systematically 
compartmentalized library of progeny molecules, with compartments that can be screened 
systematically, e.g. one by one. In other words this invention provides that, through the 
selective and judicious use of specific nucleic acid building blocks, coupled with the 
selective and judicious use of sequentially stepped assembly reactions, a design can be 
achieved where specific sets of progeny products are made in each of several reaction 
vessels. This allows a systematic examination and screening procedure to be performed. 
Thus, these methods allow a potentially very large number of progeny molecules to be 
examined systematically in smaller groups. Because of its ability to perform 
<jhimerizations in a manner that is highly flexible yet exhaustive and systematic as well, 
particularly when there is a low level of homology among the progenitor molecules, these 
methods provide for the generation of a library (or set) comprised of a large number of 
progeny molecules. Because of the non-stochastic nature of the instant ligation 
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reassembly invention, the progeny molecules generated in one aspect comprise a library 
of finalized chimeric nucleic acid molecules having an overall assembly order that is 
chosen by design. The saturation mutagenesis and optimized directed evolution methods 
also can be used to generate different progeny molecular species. It is appreciated that 

5 the invention provides freedom of choice and control regarding the selection of 

demarcation points, the size and number of the nucleic acid building blocks, and the size 
and design of the couplings. It is appreciated, forfhennore, that the requirement for 
intermolecular homology is highly relaxed for the operability of this invention. In fact, 
demarcation points can even be chosen in areas of little or no intermolecular homology. 

1 0 For example, because of codon wobble, i.e. the degeneracy of codons, nucleotide 

substitutions can be introduced mto nucleic acid building blocks without altering the 
amino acid originally encoded in the corresponding progenitor template. Alternatively, a 
codon can be altered such that the coding for an originally amino acid is altered. This 
invention provides that such substitutions can be introduced into the nucleic acid building 

1 5 block in order to mcrease the incidence of intermolecular homologous demarcation points 
and thus to allow an increased number of couplings to be achieved among the building 
blocks, which in turn allows a greater number of progeny chimeric molecules to be 
generated. 

Synthetic gene reassembly 

20 In one aspect, the present invention provides a non-stochastic method termed 

synthetic gene reassembly, that is somewhat related to stochastic shuffling, save that the 
nucleic acid building blocks are not shuffled or concatenated or chimerized randomly, but 
rather are assembled non-stochastically. See, e.g., U.S. Patent No. 6,537,776. 

The synthetic gene reassembly method does not depend on the presence of a high 

25 level of homology between polynucleotides to be shuffled. The invention can be used to 
non-stochastically generate libraries (or sets) of progeny molecules comprised of over 
10^°** different chimeras. Conceivably, synthetic gene reassembly can even be used to 
generate hbraries comprised of over 10*^° different progeny chimeras. 

Thus, in one aspect, the invention provides a non-stochastic method of producing 

30 a set of finalized chimeric nucleic acid molecules having an overall assembly order that is 
chosen by design, which method is comprised of the steps of generating by design a 
plurahty of specific nucleic acid building blocks having serviceable mutually compatible 
ligatable ends and assembling these nucleic acid building blocks, such that a designed 
overall assembly order is achieved. 
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The mutually compatible ligatable ends of the nucleic acid building blocks to be 
assembled are considered to be "serviceable" for this type of ordered assembly if they 
enable the building blocks to be coupled in predetermined orders. Thus, m one aspect, 
the overall assembly order in which the nucleic acid building blocks can be coupled is 
5 specified by the design of the Ugatable ends and, if more than one assembly step is to be 
used, then the overall assembly order in which the nucleic acid building blocks can be 
coupled is also specified by the sequential order of the assembly step(s). In a one aspect 
of the invention, the annealed building pieces are treated with an enzyme, such as a ligase 
(e.g., T4 DNA hgase) to achieve covalent bonding of the building pieces. 
10 In a another aspect, the design of nucleic acid building blocks is obtamed upon 

analysis of the sequences of a set of progenitor nucleic acid templates that serve as a basis 
for producing a progeny set of finalized chimeric nucleic acid molecules. These 
progenitor nucleic acid templates thus serve as a source of sequence information that aids 
in the design of the nucleic acid building blocks that are to be mutagenized, i.e, 
1 5 chimerized or shuffled. 

In one exemplification, the invention provides for the chimerization of a family of 
related genes and their encoded family of related products. In a particular 
exemplification, the encoded products are enzymes. The cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzymes of the present invention 
20 can be mutagenized in accordance with the methods described herein. 

Thus according to one aspect of the invention, the sequences of a pluraUty of 
progenitor nucleic acid templates {e.g., polynucleotides of the invention) are ahgned in 
order to select one or more demarcation pomts, which demarcation points can be located 
at an area of homology. The demarcation points can be used to delineate the boimdaries 
25 of nucleic acid buildmg blocks to be generated. Thus, the demarcation points identified 
and selected in the progenitor molecules serve as potential chimerization points in the 
assembly of the progeny molecules. 

In one aspect, a serviceable demarcation point is an area of homology (comprised 
of at least one homologous nucleotide base) shared by at least two progenitor templates, 
30 but the demarcation pomt can be an area of homology that is shared by at least half of the 
progenitor templates, at least two thirds of the progenitor templates, at least three fourths 
of the progenitor templates and in one aspect at almost all of the progenitor templates. 
Even more in one aspect still a serviceable demarcation point is an area of homology that 
is shared by all of the progenitor templates. 
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In a one aspect, the gene reassembly process is performed exhaustively in order to 
generate an exhaustive library. In other words, all possible ordered combinations of the 
nucleic acid building blocks are represented in the set of finalized chimeric nucleic acid 
molecules. At the same tune, the assembly order (i.e. the order of assembly of each 
building block in the 5' to 3 sequence of each fmahzed chimeric nucleic acid) in each 
combination is by design (or non-stochastic). Because of the non-stochastic nature of the 
method, the possibility of unwanted side products is greatly reduced. 

In another aspect, the method provides that the gene reassembly process is 
performed systematically, for example to generate a systematically compartmentalized 
library, with compartments that can be screened systematically, e.g., one by one. In other 
words the invention provides that, through the selective and judicious use of specific 
nucleic acid building blocks, coupled with the selective and judicious use of sequentially 
stepped assembly reactions, an experimental design can be achieved where specific sets 
of progeny products are made in each of several reaction vessels. This allows a 
systematic examination and screening procedure to be performed. Thus, it allows a 
potentially very large number of progeny molecules to be examined systematically in 
smaller groups. 

Because of its ability to perform chimerizations in a manner that is highly flexible 
yet exhaustive and systematic as well, particularly when there is a low level of homology 
among the progenitor molecules, the instant invention provides for the generation of a 
library (or set) comprised of a large number of progeny molecules. Because of the non- 
stochastic nature of the instant gene reassembly invention, the progeny molecules 
generated in one aspect comprise a library of finalized chimeric nucleic acid molecules 
having an overall assembly order that is chosen by design. In a particularly aspect, such a 
generated library is comprised of greater than 10^ to greater than 10^°°** different progeny 
molecular species. 

In one aspect, a set of finalized chimeric nucleic acid molecules, produced as 
described is comprised of a polynucleotide encodhig a polypeptide. According to one 
aspect, this polynucleotide is a gene, which may be a man-made gene. Accordmg to 
another aspect, this polynucleotide is a gene pathway, which may be a man-made gene 
pathway. The invention provides that one or more man-made genes generated by the 
invention may be incorporated into a man-made gene pathway, such as pathway operable 
in a eukaryotic organism (including a plant). 
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In another exemplification, the synthetic nature of the step in which the building 
blocks are generated allows the design and introduction of nucleotides (e.g., one or more 
nucleotides, which may be, for example, codons or introns or regulatory sequences) that 
can later be optionally removed in an in vitro process (e.g., by mutagenesis) or in an in 

5 vivo process (e.g., by utiUzing the gene spUcing ability of a host organism). It is 

appreciated that in many instances the introduction of these nucleotides may also be 
desirable for many other reasons in addition to the potential benefit of creating a 
serviceable demarcation point. 

Thus, according to another aspect, the invention provides that a nucleic acid 

10 building block can be used to introduce an intron. Thus, the invention provides that 
functional introns may be introduced into a man-made gene of the invention. The 
invention also provides that functional introns may be introduced into a man-made gene 
pathway of the invention. Accordingly, the invention provides for the generation of a 
chimeric polynucleotide that is a man-made gene containing one (or more) artificially 

1 5 introduced intron(s) . 

The invention also provides for the generation of a chimeric polynucleotide that is 
a man-made gene pathway containing one (or more) artificidly introduced intron(s). In 
one aspect, the artificially introduced intron(s) are functional in one or more host cells for 
gene splicing much in the way that naturally-occurring introns serve functionally in gene 

20 splicing. The invention provides a process of producing man-made intron-containing 
polynucleotides to be introduced into host organisms for recombination and/or splicing. 

A man-made gene produced using the invention can also serve as a substrate for 
recombination with another nucleic acid. Likewise, a man-made gene pathway produced 
using the invention can also serve as a substrate for recombination with another nucleic 

25 acid. In one aspect, the recombination is facilitated by, or occurs at, areas of homology 
between the man-made, intron-containing gene and a nucleic acid, which serves as a 
recombmation partner. In one aspect, the recombination partner may also be a nucleic 
acid generated by the invention, including a man-made gene or a man-made gene 
pathway. Recombination may be facilitated by or may occur at areas of homology that 

30 exist at the one (or more) artificially introduced intron(s) in the man-made gene. 

In one aspect, the synthetic gene reassembly method of the invention utiUzes a 
plurality of nucleic acid building blocks, each of which in one aspect has two ligatable 
ends. The two Ugatable ends on each nucleic acid building block may be two blunt ends 
(i.e. each having an overhang of zero nucleotides), or in one aspect one bl\mt end and one 
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overhang, or more in one aspect still two overhangs. In one aspect, a useful overhang for 
this purpose may be a 3 ' overhang or a 5' overhang. Thus, a nucleic acid building block 
may have a 3' overhang or alternatively a 5' overhang or alternatively two 3' overhangs 
or alternatively two 5' overhangs. The overall order in which the nucleic acid building 
blocks are assembled to form a finalized chimeric nucleic acid molecule is determined by 
purposeful experimental design and is not random. 

In one aspect, a nucleic acid building block is generated by chemical 
synthesis of two single-stranded nucleic acids (also referred to as single-stranded oligos) 
and contacting them so as to allow them to anneal to form a double-stranded nucleic acid 
building block. A double-stranded nucleic acid building block can be of variable size. 
The sizes of these building blocks can be small or large. Exemplary sizes for building 
block range from 1 base pair (not including any overhangs) to 100,000 base pairs (not 
including any overhangs). Other exemplary size ranges are also provided, which have 
lower limits of from 1 bp to 10,000 bp (including every integer value in between) and 
upper limits of from 2 bp to 100, 000 bp (including every integer value in between). 

Many methods exist by which a double-stranded nucleic acid building 
block can be generated that is serviceable for the invention; and these are known in the art 
and can be readily performed by the skilled artisan. According to one aspect, a double- 
stranded nucleic acid building block is generated by first generating two single stranded 
nucleic acids and allowing them to anneal to form a double-stranded nucleic acid building 
block. The two strands of a double-stranded nucleic acid building block may be 
complementary at every nucleotide apart from any that form an overhang; thus containing 
no mismatches, apart from any overliang(s). According to another aspect, the two strands 
of a double-stranded nucleic acid building block are complementary at fewer than every 
nucleotide apart from any that form an overhang. Thus, accordmg to this aspect, a 
double-stranded nucleic acid building block can be used to introduce codon degeneracy. 
In one aspect the codon degeneracy is introduced using the site-saturation mutagenesis 
described herein, using one or more N,N,G/T cassettes or alternatively using one or more 
N,N,N cassettes. 

The in vivo recombination method of the invention can be performed 
blindly on a pool of unknovm hybrids or alleles of a specific polynucleotide or sequence. 
However, it is not necessary to know the actual DNA or RNA sequence of the specific 
polynucleotide. The approach of using recombination within a mixed population of genes 
can be usefiil for the generation of any useful proteins, for example, a cellulase of the 
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invention or a variant thereof. This approach may be used to generate proteins having 
altered specificity or activity. The approach may also be useful for the generation of 
hybrid nucleic acid sequences, for example, promoter regions, introns, exons, enhancer 
sequences, 31 untranslated regions or 51 untranslated regions of genes. Thus this 

5 approach may be used to generate genes having increased rates of expression. This 
approach may also be useful in the study of repetitive DNA sequences. Finally, this 
approach may be useful to make ribozymes or aptamers of the invention. 

In one aspect the invention described herein is directed to the use of 
repeated cycles of reductive reassortment, recombination and selection which allow for 

10 the directed molecular evolution of highly complex linear sequences, such as DNA, RNA 
or proteins thorough recombination. 

Optimized Directed Evolution System 

The invention provides a non-stochastic gene modification system termed 
"optimized directed evolution system" to generate polypeptides, e.g., cellxxlase, e.g., 

15 endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzymes or 
antibodies of the invention, with new or altered properties. In one aspect, optimized 
directed evolution is directed to the use of repeated cycles of reductive reassortment, 
recombination and selection that allow for the directed molecular evolution of nucleic 
acids through recombination. 

20 Optimized directed evolution allows generation of a large population of evolved 

chimeric sequences, wherein the generated population is significantly enriched for 
sequences that have a predetermined number of crossover events. A crossover event is a 
point in a chimeric sequence where a shift in sequence occurs fi'om one parental variant to 
another parental variant. Such a point is normally at the juncture of where 

25 oligonucleotides fi-om two parents are ligated together to form a single sequence. This 
method allows calculation of the correct concentrations of oligonucleotide sequences so 
that the final chimeric population of sequences is enriched for the chosen number of 
crossover events. This provides more control over choosing chimeric variants having a 
predetermined number of crossover events. 

30 In addition, this method provides a convenient means for exploring a tremendous 

amount of the possible protein variant space in comparison to other systems. Previously, 
if one generated, for example, 10^^ chimeric molecules during a reaction, it would be 
extremely difficult to test such a high number of chimeric variants for a particular 
activity. Moreover, a significant portion of the progeny population would have a very 
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high number of crossover events which resulted in proteuis that were less likely to have 
increased levels of a particular activity. By using these methods, the population of 
chimerics molecules can be enriched for those variants that have a particular number of 
crossover events. Thus, although one can still generate 10*^ chimeric molecules during a 

5 reaction, each of the molecules chosen for further analysis most likely has, for example, 
only three crossover events. Because the resulting progeny population can be skewed to 
have a predetermined nxmiber of crossover events, the boundaries on the functional 
variety between the chimeric molecules is reduced. This provides a more manageable 
mmiber of variables when calculating which oligonucleotide from the original parental 

10 polynucleotides might be responsible for affecting a particular trait. 

One method for creating a chimeric progeny polynucleotide sequence is to create 
oligonucleotides corresponding to fragments or portions of each parental sequence. Each 
oligonucleotide in one aspect includes a unique region of overlap so that mixing the 
oligonucleotides together results in a new variant that has each oligonucleotide fragment 

1 5 assembled in the correct order. Alternatively protocols for practicing these methods of 
the invention can be found in U.S. Patent Nos. 6,773,900; 6,740,506; 6,713,282; 
6,635,449; 6,605,449; 6,537,776; 6,361,974. 

The nimiber of oligonucleotides generated for each parental variant bears a 
relationship to the total number of resulting crossovers in the chimeric molecule that is 

20 ultunately created. For example, three parental nucleotide sequence variants might be 
provided to undergo a ligation reaction in order to find a chimeric variant having, for 
example, greater activity at high temperature. As one example, a set of 50 
ohgonucleotide sequences can be generated corresponding to each portions of each 
parental variant Accordingly, during the ligation reassembly process there could be up to 

25 50 crossover events within each of the chimeric sequences. The probability that each of 
the generated chimeric polynucleotides will contain oUgonucleotides from each parental 
variant in alternating order is very low. If each oligonucleotide fragment is present in the 
ligation reaction in the same molar quantity it is likely that in some positions 
oUgonucleotides from the same pai'ental polynucleotide will ligate next to one another 

30 and thus not result in a crossover event. If the concentration of each oligonucleotide firtm 
each parent is kept constant during any ligation step in this example, there is a 1/3 chance 
(assuming 3 parents) that an oligonucleotide from the same parental variant will ligate 
within the chimeric sequence and produce no crossover. 
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Accordingly, a probability density function (PDF) can be determined to predict 
the population of crossover events that are likely to occur during each step in a ligation 
reaction given a set number of parental variants, a number of oligonucleotides 
corresponding to each variant, and tlie concentrations of each variant during each step ia 
5 the ligation reaction. The statistics and mathematics behind determining the PDF is 

described below. By utilizing these methods, one can calculate such a probability density 
function, and thus enrich the chimeric progeny population for a predetermined number of 
crossover events resulting &om a particular ligation reaction. Moreover, a target number 
of crossover events can be predetermined, and the system then programmed to calculate 

10 the starting quantities of each parental oligonucleotide during each step in the ligation 
reaction to result in a probability density function that centers on the predetermined 
number of crossover events. These methods are directed to the use of repeated cycles of 
reductive reassortment, recombination and selection that allow for the directed molecular 
evolution of a nucleic acid encoding a polypeptide through recombination. This system 

15 allows generation of a large population of evolved chimeric sequences, wherein the 

generated population is significantly enriched for sequences that have a predetermined 
nmnber of crossover events. A crossover event is a point in a chimeric sequence where a 
shift in sequence occurs from one parental variant to another parental variant. Such a 
point is normally at the juncture of where oligonucleotides from two parents are ligated 

20 together to form a single sequence. The method allows calculation of the correct 

concentrations of oligonucleotide sequences so that the final chimeric population of 
sequences is enriched for the chosen number of crossover events. This provides more 
control over choosing chimeric variants having a predetermined number of crossover 
events. 

25 In addition, these methods provide a convenient means for exploring a tremendous 

amount of the possible protein variant space in comparison to other systems. By using 
the methods described herein, the population of chimerics molecules can be enriched for 
those variants that have a particular number of crossover events. Thus, although one can 
still generate 10^^ chimeric molecules during a reaction, each of the molecules chosen for 

30 further analysis most likely has, for example, only three crossover events. Because the 

resulting progeny population can be skewed to have a predetermined nimiber of crossover 
events, the boundaries on the functional variety between the chimeric molecules is 
reduced. This provides a more manageable number of variables when calculating which 
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oligonucleotide firom the original parental polynucleotides might be responsible for 
affecting a particular trait. 

In one aspect, the method creates a chimeric progeny polynucleotide sequence by 
creating oligonucleotides corresponding to fragments or portions of each parental 
5 sequence. Each oligonucleotide in one aspect includes a unique region of overlap so that 
mixing the oligonucleotides together results in a new variant that has each oligonucleotide 
fragment assembled in the correct order. See also U.S. Patent Nos. 6,773,900; 6,740,506; 
6,713,282; 6,635,449; 6,605,449; 6,537,776; 6,361,974. 

Determining Crossover Events 

10 Aspects of the invention include a system and software that receive a desired 

crossover probability density function (PDF), the number of parent genes to be 
reassembled, and the number of fragments in the reassembly as inputs. The output of this 
program is a "fragment PDF" that can be used to determine a recipe for producing 
reassembled genes, and the estimated crossover PDF of those genes. The processing 

1 5 described herein is in one aspect performed in MATLAB'^'*^ (The Mathworks, Natick, 
Massachusetts) a programming language and development environment for technical 
computing. 

Iterative Processes 

Any process of the invention can be iteratively repeated, e.g., a nucleic acid 

20 encoding an altered or new cellulase phenotype, e.g., endoglucanase, cellobiohydrolase, 
maimanase and/or beta-glucosidase enzyme of the invention, can be identified, re- 
isolated, again modified, re-tested for activity. This process can be iteratively repeated 
until a desired phenotype is engineered. For example, an entire biochemical anabolic or 
cataboUc pathway can be engineered into a cell, including, e.g., cellulase, e.g., 

25 endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme activity. 

Similarly, if it is determined that a particular oligonucleotide has no affect at all 
on the desired trait (e.g., a new cellulase, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase enzyme phenotype), it can be removed as a variable 
by synthesizing larger parental oligonucleotides that include the sequence to be removed. 

30 Since incorporating the sequence within a larger sequence prevents any crossover events, 
•there will no longer be any variation of this sequence in the progeny polynucleotides. 
This iterative practice of determining which oligonucleotides are most related to the 
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desired trait, and which are unrelated, allows more efficient exploration all of the possible 
protein variants that might be provide a particular trait or activity. 

In vivo shiifflmg 

In various aspects, in vivo shuffling of molecules is used in methods of the 

5 invention to provide variants of polypeptides of the invention, e.g., antibodies of the 
invention or cellulases of the invention, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase enzymes, and the like. In vivo shufflmg can be 
performed utiUzing the natural property of cells to recombine multimers. While 
recombination in vivo has provided the major natural route to molecular diversity, genetic 

10 recombination remains a relatively complex process that involves 1) the recognition of 
homologies; 2) strand cleavage, strand invasion, and metabolic steps leading to the 
production of recombinant chiasma; and finally 3) the resolution of chiasma into discrete 
recombined molecules. The formation of the chiasma requires the recognition of 
homologous sequences. 

15 In another aspect, the invention includes a method for producing a hybrid 

polynucleotide firom at least a first polynucleotide and a second polynucleotide. The 
invention can be used to produce a hybrid polynucleotide by introducing at least a first 
polynucleotide and a second polynucleotide (e.g., one, or both, being an exemplary 
cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 

20 enzyme-encoding sequence of the invention) which share at least one region of partial 
sequence homology into a suitable host cell. The regions of partial sequence homology 
promote processes which result in sequence reorganization producing a hybrid 
polynucleotide. The term "hybrid polynucleotide", as used herein, is any nucleotide 
sequence which results firom the method of the present invention and contains sequence 

25 fi-om at least two original polynucleotide sequences. Such hybrid polynucleotides can 
result firom intermolecular recombination events which promote sequence integration 
between DNA molecules. In addition, such hybrid polynucleotides can result firom 
intramolecular reductive reassortment processes which utihze repeated sequences to alter 
a nucleotide sequence within a DNA molecule. 

30 In one aspect, vivo reassortment focuses on "inter-molecular" processes 

collectively referred to as "recombination"; which in bacteria, is generally viewed as a 
"RecA-dependent" phenomenon. The invention can rely on recombination processes of a 
host cell to recombine and re-assort sequences, or the cells' ability to mediate reductive 
processes to decrease the complexity of quasi-repeated sequences in the ceU by deletion. 
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This process of "reductive reassortment" occurs by an "intra-molecular", RecA- 
independent process. 

In another aspect of the invention, novel polynucleotides can be generated by the 
process of reductive reassortment. The method involves the generation of constructs 

5 containing consecutive sequences (original encoding sequences), their insertion into an 
appropriate vector and their subsequent introduction into an appropriate host cell. The 
reassortment of the individual molecular identities occurs by combinatorial processes 
between the consecutive sequences in the construct possessing regions of homology, or 
between quasi-repeated units. The reassortment process recombines and/or reduces the 

10 complexity and extent of the repeated sequences and results in the production of novel 
molecular species. Various treatments may be applied to enhance the rate of 
reassortment. These coxild include treatment with ultra-violet light, or DNA damaging 
chemicals and/or the use of host cell lines displaying enhanced levels of "genetic 
instability". Thus the reassortment process may involve homologous recombmation or 

15 the natural property of quasi-repeated sequences to direct their own evolution. 

Repeated or "quasi-repeated" sequences play a role in genetic instability. In one 
aspect, "quasi-repeats" are repeats that are not restricted to their original xmit structure. 
Quasi-repeated units can be presented as an array of sequences in a construct; consecutive 
units of similar sequences. Once ligated, the junctions between the consecutive 

20 sequences become essentially invisible and the quasi-repetitive nature of the resulting 
construct is now continuous at the molecular level. The deletion process the cell 
performs to reduce the complexity of the resulting construct operates between the quasi- 
repeated sequences. The quasi-repeated units provide a practically limitless repertoire of 
templates upon which slippage events can occur. In one aspect, the constructs containing 

25 the quasi-repeats thus effectively provide sufficient molecular elasticity that deletion (and 
potentially insertion) events can occur virtually anywhere within the quasi-repetitive 
imits. 

When the quasi-repeated sequences are all Ugated in the same orientation, for 
instance head to tail or vice versa, the cell cannot distinguish individual units. 
30 Consequently, the reductive process can occur throughout the sequences. In contrast, 
when for example, the units are presented head to head, mther than head to tail, the 
inversion delineates the endpoints of the adjacent unit so that deletion formation will 
favor the loss of discrete imits. Thus, it is preferable with the present method that the 
sequences are in the same orientation. Random orientation of quasi-repeated sequences 
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will result in the loss of reassortment efficiency, while consistent orientation of the 
sequences will offer the highest efficiency. However, while having fewer of the 
contiguous sequences in the same orientation decreases the efficiency, it may still provide 
sufficient elasticity for the effective recovery of novel molecules. Constructs can be 
made with the quasi-repeated sequences in the same orientation to allow higher 
efficiency. 

Sequences can be assembled in a head to tail orientation using any of a 
variety of methods, including the following: 

a) Primers that include a poly-A head and poly-T tail which when made single- 
stranded would provide orientation can be utilized. This is accomplished by 
having tihe first few bases of the primers made firom RNA and hence easily 
removed RNaseH. 

b) Primers that include unique restriction cleavage sites can be utilized. 
Multiple sites, a battery of unique sequences and repeated synthesis and 
ligation steps wo^lld be required. 

c) The inner few bases of the primer could be thiolated and an exonuclease 
used to produce properly tailed molecules. 

In one aspect, the recovery of the re-assorted sequences relies on the identification 
of cloning vectors with a reduced repetitive index (RI). The re-assorted encoding 
sequences can then be recovered by amplificatioiL The products are re-cloned and 
expressed. The recovery of cloning vectors with reduced RI can be affected by: 

1) The use of vectors only stably maintained when the construct is reduced in 
complexity. 

2) The physical recovery of shortened vectors by physical procedures. In this 
case, the cloning vector would be recovered using standard plasmid isolation 
procedures and size firactionated on either an agarose gel, or column with a low 
molecular weight cut off utilizing standard procedures. 

3) The recovery of vectors containing interrupted genes which can be selected 
when insert size decreases. 

4) The use of direct selection techniques with an expression vector and the 
appropriate selection. 

Encoding sequences (for example, genes) from related organisms may 
demonstrate a high degree of homology and encode quite diverse protein products. These 
types of sequences are particularly useful in the present invention as quasi-repeats. 
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However, while the examples illustrated below demonstrate the reassortment of nearly 
identical original encoding sequences (quasi-repeats), this process is not limited to such 
nearly identical repeats. 

The following example demonstrates an exemplary method of the invention. 
5 Encodmg nucleic acid sequences (quasi-repeats) derived from tliree (3) unique species are 
described. Each sequence encodes a protein with a distinct set of properties. Each of the 
sequences differs by a single or a few base pairs at a unique position in the sequence. The 
quasi-repeated sequences are separately or collectively amplified and ligated into random 
assemblies such that all possible permutations and combinations are available in the 

10 population of ligated molecules. The number of quasi-repeat units can be controlled by 
the assembly conditions. The average number of quasi-repeated units in a construct is 
defined as the repetitive index (RI). 

Once formed, the constructs may, or may not be size fractionated on an agarose 
gel according to published protocols, inserted into a cloning vector and transfected into an 

15 appropriate host cell The cells are then propagated and "reductive reassortment" is 
effected. The rate of the reductive reassortment process may be stimulated by the 
introduction of DNA damage if desired. Whether the reduction in RI is mediated by 
deletion formation between repeated sequences by an **intra-molecular*' mechanism, or 
mediated by recombination-Uke events through "inter-molecular" mechanisms is 

20 immaterial. The end result is a reassortment of the molecules into all possible 
combinations. 

Optionally, the method comprises the additional step of screening the Ubrary 
members of the shuffled pool to identify individual shuffled library members having the 
ability to bind or otherwise interact, or catalyze a particular reaction (e.g,^ such as 
25 catalytic domain of an enzyme) with a predetermined macromolecule, such as for 

example a proteinaceous receptor, an oligosaccharide, virion, or other predetermined 
compound or structure. 

The polypeptides that are identified from such libraries can be used for 
therapeutic, diagnostic, research and related purposes (e.g., catalysts, solutes for 
30 increasing osmolarity of an aqueous solution and the like) and/or can be subjected to one 
or more additional cycles of shuffling and/or selection. 

In another aspect, it is envisioned that prior to or during recombination or 
reassortment, polynucleotides generated by the method of the invention can be subjected 
to agents or processes which promote the introduction of mutations into the origmal 
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polynucleotides. The introduction of such niutations would increase the diversity of 
resulting hybrid polynucleotides and polypeptides encoded therefrom. The agents or 
processes which promote mutagenesis can include, but are not limited to: (+)-CC-1065, 
or a synthetic analog such as (+)-CC-1065-(N3-Adenine {See Sun and Hurley, (1992); an 
N-acetylated or deacetylated 4'-fluro-4-aminobiphenyl adduct capable of inhibiting DNA 
synthesis {See , for example, van de Poll et al (1992));. or a N-acetylated or deacetylated 
4-aminobiphenyl adduct capable of inhibiting DNA synthesis {See also, van de Poll et al, 
(1992), pp. 751-758); trivalent chromium, a trivalent chromium salt, a polycyclic 
aromatic hydrocarbon (PAH) DNA adduct capable of inhibiting DNA replication, such as 
7-bromomethyl-ben2[a]anthracene ("BMA"), tris(2,3-dibromopropyl)phosphate ("Tris- 
BP"), l,2-dibromo-3-chloropropane ("DHCP"), 2-bromoacrolein (2BA), benzo[a]pyrene- 
7,8-dihydrodiol-9-10-epoxide ("BPDE"), a platmum(n) halogen salt, N-hydroxy-2- 
amino-3-methylunidazo[4,5-/]-qumolme ("N-hydroxy-IQ") andN-hydroxy-2-amino-l- 
methyl-6-phenylimidazo[4,5-/]-pyridine ("N-hydroxy-PhIP"). Exemplary means for 
slowing or halting PGR amplification consist of UV light (+)-CC-1065 and (+)-CC-1065- 
(N3-Adenine). Particularly encompassed means are DNA adducts or polynucleotides 
comprising the DNA adducts from the polynucleotides or polynucleotides pool, which 
can be released or removed by a process includmg heating the solution comprisiag the 
polynucleotides prior to further processing. 

In another aspect the invention is directed to a method of producing recombinant 
proteins having biological activity by treating a sample comprismg double-stranded 
template polynucleotides encoding a wild-type protein under conditions according to the 
invention which provide for the production of hybrid or re-assorted polynucleotides. 

Producing sequence variants 

The invention also provides additional methods for making sequence variants of 
the nucleic acid (e.g., cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or 
beta-glucosidase enzyme) sequences of the invention. The invention also provides 
additional methods for isolating cellulase, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase enzymes using the nucleic acids and polypeptides of 
the invention. In one aspect, the invention provides for variants of a cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme coding 
sequence (e.g., a gene, cDNA or message) of the invention, which can be altered by any 
means, including, e.g., random or stochastic methods, or, non-stochastic, or "directed 
evolution," methods, as described above. 
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The isolated variants may be naturally occurring. Variant can also be created in 
vitro. Variants may be created using genetic engineering techniques such as site directed 
mutagenesis, random chemical mutagenesis, Exonuclease III deletion procedures, and 
standard cloning techniques. Alternatively, such variants, fragments, analogs, or 

5 derivatives may be created using chemical synthesis or modification procedures. Other 
methods of making variants are also familiar to those skilled in the art. These include 
procedures in which nucleic acid sequences obtained from natural isolates are modified to 
generate nucleic acids which encode polypeptides having characteristics which enhance 
their value in industrial or laboratory applications. In such procedures, a large number of 

1 0 variant sequences having one or more nucleotide differences with respect to the sequence 
obtained from the natural isolate are generated and characterized. These nucleotide 
differences can result in amino acid changes with respect to the polypeptides encoded by 
the nucleic acids from the natural isolates. 

For example, variants may be created using error prone PGR. In one aspect of 

1 5 error prone PGR, the PGR is performed under conditions where the copying fidelity of the 
DNA polymerase is low, such that a high rate of point mutations is obtained along the 
entire length of the PGR product. Error prone PGR is described, e.g., in Leung (1989) 
Technique 1:11-15) and Caldwell (1992) PGR Methods Applic. 2:28-33. Briefly, in such 
procedures, nucleic acids to be mutagenized are mixed with PGR primers, reaction buffer, 

20 MgGl2, MnGh, Taq polymerase and an appropriate concentration of dNTPs for achieving 
a high rate of point mutation along the entire length of the PGR product. For example, 
the reaction may be performed using 20 finoles of nucleic acid to be mutagenized, 30 
pmole of each PGR primer, a reaction buffer comprising 50mM KGl, lOmM Tris HGl 
(pH 8.3) and 0.01% gelatin, 7mM MgG12, 0.5mMMnGl2, 5 units of Taq polymerase, 

25 0.2mM dGTP, 0.2mM dATP, ImM dGTP, and ImM dTTP. PGR may be performed for 
30 cycles of 94°G for 1 min, 45 ^'G for 1 min, and 72°G for 1 min. However, it will be 
appreciated that these parameters may be varied as appropriate. The mutagenized nucleic 
acids are cloned into an appropriate vector and the activities of tiie polypeptides encoded 
by the mutagenized nucleic acids are evaluated. 

30 In one aspect, variants are created using oligonucleotide directed mutagenesis to 

generate site-specific mutations in any cloned DNA of interest. Oligonucleotide 
mutagenesis is described, e.g., in Reidhaar-Olson (1988) Science 241:53-57. Briefly, in 
such procedures a plurahty of double stranded oUgonucleotides bearing one or more 
mutations to be introduced into the cloned DNA are synthesized and inserted into the 
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cloned DNA to be mutagenized. In one aspect, clones containing the mutagenized DNA 
are recovered, expressed, and the activities of the polypeptide encoded therein assessed. 

Another method for generating variants is assembly PGR. Assembly PGR 
involves the assembly of a PGR product from a mixture of small DNA fragments. A large 

5 number of different PGR reactions occur in parallel in the same vial, with the products of 
one reaction priming the products of another reaction. Assembly PGR is described in, 
e.g., U.S. Patent No. 5,965,408. 

In one aspect, sexual PGR mutagenesis is an exemplary method of generating 
variants of the invention. In one aspect of sexual PGR mutagenesis forced homologous 

1 0 recombination occurs between DNA molecules of dijfferent but highly related DNA 
sequence i?i vitro, as a result of random fragmentation of the DNA molecule based on 
sequence homology, followed by fixation of the crossover by primer extension in a PGR 
reaction. Sexual PGR mutagenesis is described, e.g., in Stemmer (1994) Proc. Natl. 
Acad. Sci. USA 91:10747-10751, Briefly, in such procedures a plurality of nucleic acids 

15 to be recombined are digested with DNase to generate fragments having an average size 
of 50-200 nucleotides. Fragments of the desired average size are purified and 
resuspended in a PGR mixture. PGR is conducted under conditions which facilitate 
recombination between the nucleic acid fragments. For example, PGR may be performed 
by resuspending the purified fragments at a concentration of 10-30ng/|il in a solution of 

20 0.2mM of each dNTP, 2.2mM MgCh, 50mM KGL, lOmM Tris HGl, pH 9.0, and 0.1% 
Triton X-100. 2.5 units of Taq polymerase per 100:1 of reaction mixture is added and 
PGR is performed usmg the foUowmg regime: 94°G for 60 seconds, 94**G for 30 seconds, 
50-55°G for 30 seconds, 72*'G for 30 seconds (30-45 times) and 72^G for 5 minutes. 
However, it will be appreciated that these parameters may be varied as appropriate. In 

25 some aspects, oligonucleotides may be included in the PGR reactions. In other aspects, 
the Klenow fragment of DNA polymerase I may be used in a first set of PGR reactions 
and Taq polymerase may be used in a subsequent set of PGR reactions. Recombinant 
sequences are isolated and the activities of the polypeptides they encode are assessed. 
In one aspect, variants are created by in vivo mutagenesis. In some aspects, 

30 random mutations in a sequence of interest are generated by propagating the sequence of 
interest in a bacterial strain, such as an E, coli strain, which carries mutations in one or 
more of the DNA repair pathways. Such "mutator" strains have a higher random 
mutation rate than that of a wild-type parent. Propagating the DNA in one of these strains 
will eventually generate random mutations within the DNA. Mutator strains suitable for 
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use for in vivo mutagenesis are described in PCT Publication No. WO 91/16427, 
published October 31,1991, entitled "Methods for Phenotype Creation from Multiple 
Gene Populations". 

Variants may also be generated using cassette mutagenesis. In cassette 
mutagenesis a small region of a double stranded DNA molecule is replaced with a 
synthetic oligonucleotide "cassette" that differs from the native sequence. The 
oligonucleotide often contains completely and/or partially randomized native sequence. 

Recursive ensemble mutagenesis may also be used to generate variants. Recursive 
ensemble mutagenesis is an algorithm for protein engineering (protein mutagenesis) 
developed to produce diverse populations of phenotypically related mutants whose 
members differ in amino acid sequence. This method uses a feedback mechanism to 
control successive roimds of combinatorial cassette mutagenesis. Recursive ensemble 
mutagenesis is described, e.g., in Arkin (1992) Proc. Natl. Acad. Sci. USA 89:7811-7815. 

In some aspects, variants are created using exponential ensemble mutagenesis. 
Exponential ensemble mutagenesis is a process for generating combinatorial libraries 
with a high percentage of unique and functional mutants, wherein small groups of 
residues are randomized in parallel to identify, at each altered position, amino acids 
which lead to functional proteins. Exponential ensemble mutagenesis is described, e.g., 
inDelegrave (1993) Biotechnology Res. 11:1548-1552. Random and site-directed 
mutagenesis are described, e.g., in Arnold (1993) Current Opinion in Biotechnology 
4:450-455. 

In some aspects, the variants are created using shuffling procedures wherein 
portions of a plurality of nucleic acids which encode distinct polypeptides are fused 
together to create chimeric nucleic acid sequences which encode chimeric polypeptides as 
described in U.S. Patent No. 5,965,408, fUed July 9, 1996, entitied, "Method of DNA 
Reassembly by Interrupting Synthesis" and U.S. Patent No. 5,939,250, filed May 22, 
1996, entitled, "Production of Enzymes Having Desired Activities by Mutagenesis. 

Tlie variants of the polypeptides of the invention may be variants in which one or 
more of the amino acid residues of the polypeptides of the sequences of the invention are 
substituted with a conserved or non-conserved amino acid residue (in one aspect a 
conserved amino acid residue) and such substituted amino acid residue may or may not be 
one encoded by the genetic code. 

In one aspect, conservative substitutions are those that substitute a given amino 
acid in a polypeptide by another amino acid of like characteristics. In one aspect, 
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conservative substitutions of the invention comprise the following replacements: 
replacements of an aliphatic amino acid such as Alanine, Valine, Leucine and Isoleucine 
with another aliphatic amino acid; replacement of a Serine with a Threonme or vice versa; 
replacement of an acidic residue such as Aspartic acid and Glutamic acid with another 
acidic residue; replacement of a residue bearing an amide group, such as Aspamgme and 
Glutamme, with another residue bearing an amide group; exchange of a basic residue 
such as Lysine and Argmine with another basic residue; and replacement of an aromatic 
residue such as Phenylalanine, Tyrosine with another aromatic residue. 

Other variants are those in which one or more of the amino acid residues of a 
polypeptide of the invention includes a substituent group. In one aspect, other variants 
are those in which the polypeptide is associated with another compound, such as a 
compound to increase the half-life of the polypeptide (for example, polyethylene glycol). 
Additional variants are those in which additional amino acids are fused to the polypeptide, 
such as a leader sequence, a secretory sequence, a proprotein sequence or a sequence 
which facilitates purification, enrichment, or stabilization of the polypeptide. 

In some aspects, the fragments, derivatives and analogs retain the same biological 
function or activity as the polypeptides of the mvention. In other aspects, flie fragment, 
derivative, or analog includes a proprotem, such that the fragment, derivative, or analog 
can be activated by cleavage of the proprotein portion to produce an active polypeptide. 

Optimizing codons to achieve high levels of protein expression in host cells 

The invention provides methods for modifying cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase, enzyme-encoding nucleic acids to 
modify (e.g., optimize) codon usage. In one aspect, the invention provides methods for 
modifydng codons in a nucleic acid encoding a cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme to increase or decrease its 
expression in a host cell. The invention also provides nucleic acids encoding a cellulase, 
e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme 
modified to increase its expression in a host cell, cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme so modified, and methods 
of malcing the modified cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzymes. The method comprises identifydng a "non-preferred" 
or a "less preferred" codon in cellulase, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase, enzyme-encoding nucleic acid and replacing one or 
more of these non- preferred or less preferred codons with a "preferred codon" encoding 



wo 2006/101584 



119 



PCT/US2006/002516 



5644620 1 4240/D2 1 50-2WO 

the same amino acid as the replaced codon and at least one non- preferred or less 
preferred codon in the nucleic acid has been replaced by a preferred codon encoding the 
same amino acid. A preferred codon is a codon over-represented in coding sequences in 
genes in the host ceU and a non- preferred or less preferred codon is a codon xmder- 
represented in coding sequences in genes in the host cell. 

Host cells for expressing the nucleic acids, expression cassettes and vectors of the 
invention include bacteria, yeast, fimgi, plant cells,j insect cells and mammalian cells (see 
discussion, above). Thus, the invention provides methods for optimizing codon usage in 
all of these cells, codon-altered nucleic. acids and polypeptides made by the codon-altered 
nucleic acids. Exemplary host cells include gram negative bacteria, such as Escherichia 
coli; gram positive bacteria, such as Streptomyces sp., Lactobacillus gasseri, Lactococcus 
lactis, Lactococcus cremoris, Bacillus subtilis. Bacillus cereus. Exemplary host cells also 
include eukaryotic organisms, e.g., various yeast, such as Saccharomyces sp., including 
Saccharomyces cerevisiae^ Schizosaccharomyces pombe, Pichia pastoris^ and 
Kluyveromyces lactis^ Hansenula polymorpha, Aspergillus niger^ and mammalian cells 
and cell lines and insect cells and cell lines. Thus, the invention also includes nucleic 
acids and polypeptides optimized for expression in these organisms and species. 

For example, the codons of a nucleic acid encoding a cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme isolated 
from a bacterial cell are modified such that the nucleic acid is optimally expressed m a 
bacterial cell different from the bacteria from which the cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme was derived, a yeast, a 
fimgi, a plant cell, an insect cell or a mammaUan cell. Methods for optimizing codons are 
well known in the art, see, e.g., U.S. Patent No. 5,795,737; Baca (2000) Int. J. ParasitoL 
30:113-118; Hale (1998) Protein Expr. Purif. 12:185-188; Narum (2001) Infect. Immun. 
69:7250-7253. See also Narum (2001) Infect. Immun. 69:7250-7253, describing 
optimizing codons in mouse systems; Outchkoxurov (2002) Protein Expr. Purif. 24:18-24, 
describing optimizing codons in yeast; Feng (2000) Biochemistry 39:15399-15409, 
describing optimizing codons in£, co/z; Humphreys (2000) Protein Expr. Purif. 20:252- 
264, describmg optimizing codon usage that affects secretion in E. coli. 

Transgenic non-human animals 

The invention provides transgenic non-human animals comprising a nucleic acid, 
a polypeptide (e.g., a cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or 
beta-glucosidase enzyme), an expression cassette or vector or a transfected or transformed 



wo 2006/101584 



120 



PCT/US2006/002516 



564462014240/D2150-2WO 

cell of the invention. The invention also provides methods of making and using these 
transgenic non-human animals. 

The transgenic non-human animals can be, e.g., dogs, goats, rabbits, sheep, pigs 
(including all swine, hogs and related animals), cows, rats and mice, comprising the 

5 nucleic acids of the invention. These animals can be used, e.g., as in vivo models to study 
cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
enzyme activity, or, as models to screen for agents that change the cellulase, e.g., 
endoglucanase, cellobiohydrolase, maimanase and/or beta-glucosidase enzyme activity in 
vivo. The codmg sequences for the polypeptides to be expressed in the transgenic non- 

10 hmnan animals can be designed to be constitutive, or, under the control of tissue-specific, 
developmental-specific or inducible transcriptional regulatory factors. 

Transgenic non-human animals can be designed and generated using any method 
known in the art; see, e.g., U.S. Patent Nos. 6,211,428; 6,187,992; 6,156,952; 6,118,044; 
6,111,166; 6,107,541; 5,959,171; 5,922,854; 5,892,070; 5,880,327; 5,891,698; 5,639,940; 

16 5,573,933; 5,387,742; 5,087,571, describing making and using transformed cells and eggs 
and transgenic mice, rats, rabbits, sheep, pigs and cows. See also, e.g., Pollock (1999) J. 
Immimol. Methods 231:147-157, describing the production of recombinant proteins in the 
milk of transgenic dairy animals; Baguisi (1999) Nat. Biotechnol. 17:456-461, 
demonstrating the production of transgenic goats. U.S. Patent No. 6,211,428, describes 

20 making and using transgenic non-human mammals which express in their brains a nucleic 
acid construct comprising a DNA sequence. U.S. Patent No. 5,387,742, describes 
injecting cloned recombinant or synthetic DNA sequences into fertilized mouse eggs, 
implanting the injected eggs in pseudo-pregnant females, and growing to term transgenic 
mice, U.S. Patent No. 6,187,992, describes making and usmg a transgenic mouse. 

25 "Knockout animals" can also be used to practice the methods of the invention. 

For example, in one aspect, the transgenic or modified animals of the invention comprise 
a "Imockout animal," e.g., a "knockout mouse," engineered not to express an endogenous 
gene, which is replaced with a gene expressing a cellulase, e.g., endoglucanase, 
cellobiohydrolase, maimanase and/or beta-glucosidase enzyme of the invention, or, a 

30 iusion protein comprising a cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme of the invention. 

Transgenic Plants and Seeds 

The invention provides transgenic plants and seeds comprisiag a nucleic acid, a 
polypeptide (e.g., a cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or 
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beta-glucosidase enzyme), an expression cassette or vector or a transfected or transformed 
cell of the invention. The invention also provides plant products, e.g., oils, seeds, leaves, 
extracts and the like, comprising a nucleic acid and/or a polypeptide (e.g., a cellulase, 
e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme) of 

5 the invention. The transgenic plant can be dicotyledonous (a dicot) or monocotyledonous 
(a monocot). The invention also provides methods of making and using these transgenic 
plants and seeds. The transgenic plant or plant cell expressing a polypeptide of the 
present invention may be constructed in accordance with any method known in the art. 
See, for example, U.S. Patent No. 6,309,872. 

10 Nucleic acids and expression constructs of the invention can be introduced into a 

plant cell by any means. For example, nucleic acids or expression constructs can be 
introduced into the genome of a desired plant host, or, the nucleic acids or expression 
constructs can be episomes. Introduction into the genome of a desired plant can be such 
that the host's cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta- 

15 glucosidase enzyme production is regulated by endogenous transcriptional or translational 
control elements. The invention also provides "knockout plants" where insertion of gene 
sequence by, e.g., homologous recombination, has disrupted the expression of the 
endogenous gene. Means to generate "knockout" plants are well-known in the art, see, 
e.g., Strepp (1998) Proc Natl. Acad. Sci. USA 95:4368-4373; Miao (1995) Plant J 7:359- 

20 365. See discussion on transgenic plants, below. 

The nucleic acids of the invention can be used to confer desired traits on 
essentially any plant, e.g., on starch-producing plants, such as potato, tomato, soybean, 
beets, com, wheat, rice, barley, and the like. Nucleic acids of the invention can be used to 
manipulate metabolic pathways of a plant in order to optimize or alter host's expression 

25 of cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
enzyme. The can change cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme activity in a plant. Alternatively, a cellulase, e.g., 
endoglucanase, cellobiohydrolase, maimanase and/or beta-glucosidase enzyme of the 
invention can be used in production of a transgenic plant to produce a compound not 

30 naturally produced by that plant. This can lower production costs or create a novel 
product. 

In one aspect, the first step in production of a transgenic plant involves malcing an 
expression construct for expression in a plant cell. These techniques are well known in 
the art. They can include selectmg and cloning a promoter, a coding sequence for 
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facilitating efficient binding of ribosomes to mRNA and selecting the appropriate gene 
terminator sequences. One exemplary constitutive promoter is CaMV35S, &om the 
cauliflower mosaic virus, which generally results in a high degree of expression in plants. 
Other promoters are more specific and respond to cues in the plant's internal or external 

5 environment. An exemplary light-inducible promoter is the promoter from the cab gene, 
encoding the major chlorophyll a/b binding protein. 

In one aspect, the nucleic acid is modified to achieve greater expression in a plant 
cell. For example, a sequence of the invention is likely to have a higher percentage of A- 
T nucleotide pairs compared to that seen in a plant, some of which prefer G-C nucleotide 

10 pairs. Therefore, A-T nucleotides in the coding sequence can be substituted with G-C 
nucleotides without significantly changing the amino acid sequence to enhance 
production of the gene product in plant cells. 

Selectable marker gene can be added to the gene construct in order to identify 
plant cells or tissues that have successfully integrated the transgene. This may be 

1 5 necessary becaxise achieving incorporation and expression of genes in plant cells is a rare 
event, occurring in just a few percent of the targeted tissues or cells. Selectable marker 
genes encode proteias that provide resistance to agents that are normally toxic to plants, 
such as antibiotics or herbicides. Only plant cells that have integrated the selectable 
marker gene will survive when grown on a mediinn containing the appropriate antibiotic 

20 or herbicide. As for other inserted genes, marker genes also require promoter and 
termination sequences for proper function. 

In one aspect, making transgenic plants or seeds comprises incorporating 
sequences of the invention and, optionally, marker genes into a target expression 
construct (e.g., a plasmid), along with positioning of the promoter and the terminator 

25 sequences. This can involve transferring the modified gene into the plant through a 

suitable method. For example, a construct may be introduced directly into the genomic 
DNA of the plant cell using techniques such as electroporation and microiajection of 
plant cell protoplasts, or the constructs can be introduced directly to plant tissue using 
ballistic methods, such as DNA particle bombardment. For example, see, e.g., Christou 

30 (1997) Plant MoL Biol. 35: 197-203; Pawlowski (1996) Mol. BiotechnoL 6:17-30; Klein 
(1987) Nature 327:70-73; Takumi (1997) Genes Genet. Syst. 72:63-69, discussing use of 
particle bombardment to introduce transgenes into wheat; and Adam (1997) supra, for use 
of particle bombardment to introduce YACs tuto plant cells. For example, Rinehart 
(1997) supra, used particle bombardment to generate transgenic cotton plants.- Apparatus 
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for accelerating particles is described U.S. Pat. No. 5,015,580; and, the commercially 
available BioRad (Biolistics) PDS-2000 particle acceleration instrument; see also, John, 
U.S. Patent No. 5,608,148; andEUis, U.S. Patent No. 5, 681,730, describing particle- 
mediated transformation of gynmosperms. 

5 In one aspect, protoplasts can be immobilized and injected with a nucleic acids, 

e.g., an expression construct. Although plant regeneration from protoplasts is not easy 
with cereals, plant regeneration is possible in legumes using somatic embryogenesis from 
protoplast derived callus. Organized tissues can be transformed with naked DNA using 
gene gun teclmique, where DNA is coated on tungsten microprojectiles, shot 1/lOOth the 

10 size of cells, which carry the DNA deep into cells and organelles. Transformed tissue is 
then induced to regenerate, usually by somatic embryogenesis. This technique has been 
successftil in several cereal species including maize and rice. 

Nucleic acids, e.g., expression constructs, can also be introduced in to plant cells 
using recombinant viruses. Plant cells can be transformed using viral vectors, such as, 

15 e.g., tobacco mosaic virus derived vectors (Rouwendal (1997) Plant Mol. Biol. 33:989- 
999), see Porta (1996) "Use of viral replicons for the expression of genes in plants," Mol. 
Biotechnol. 5:209-221. 

Alternatively, nucleic acids, e.g., an expression construct, can be combined with 
suitable T-DNA flanking regions and introduced into a conventional ^groiac/enww 

20 tumefaciens host vector. The virulence functions of the Agrobacterium tumefaciens host 
will direct the insertion of the construct and adjacent marker into the plant cell DNA 
when the cell is infected by the bacteria. Agrobacterium ttmefaciem-mQdiatQd 
transformation techniques, including disarming and use of binary vectors, are well 
described in the scientific literature. See, e.g., Horsch (1984) Science 233:496-498; 

25 Fraley (1983) Proa Natl. Acad ScL USA 80:4803 (1983); Gene Transfer to Plants, 
Potrykus, ed. (Springer-Verlag, Berlin 1995). The DNA in an A. tumefaciens cell is 
contained in the bacterial chromosome as well as in another structure known as a Ti 
(tumor-inducing) plasmid. The Ti plasmid contains a stretch of DNA termed T-DNA (-20 
kb long) that is transferred to the plant cell in tlie infection process and a series of vir 

30 (virulence) genes that direct the infection process. A. tumefaciens can only infect a plant 
through wounds: when a plant root or stem is woimded it gives off certain chemical 
signals, in response to which, the vir genes of ^. tumefaciens become activated and direct 
a series of events necessary for the transfer of the T-DNA from the Ti plasmid to the 
planf s chromosome. The T-DNA then enters the plant cell through the wound. One 
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speculation is that the T-DNA waits until the plant DNA is being replicated or 
transcribed, then inserts itself into the exposed plant DNA. In order to use A. tumefaciens 
as a transgene vector, the tumor-inducing section of T-DNA have to be removed, while 
retaining the T-DNA border regions and the vir genes. The transgene is then inserted 
between the T-DNA border regions, where it is transferred to the plant cell and becomes 
integrated into the planfs chromosomes. 

The invention provides for the transformation of monocotyledonous plants using 
the nucleic acids of the invention, including important cereals, see Hiei (1997) Plant Mol. 
Biol 35:205-218. See also, e.g., Horsch, Science (1984) 233:496; Fraley (1983) Proc. 
Natl. Acad. Sci USA 80:4803; Thylcjaer (1997) supra; Park (1996) Plant Mol. Biol. 
32: 1135-1 148, discussing T-DNA integration into genomic DNA. See also D'Halluin, 
U.S. Patent No. 5,712,135, describing a process for the stable integration of a DNA 
comprising a gene that is functional in a cell of a cereal, or other monocotyledonous 
plant. 

In one aspect, the third step involves selection and regeneration of whole plants 
capable of transmitting the incoiporated target gene to the next generation. Such 
regeneration techniques may use manipulation of certain phytohormones in a tissue 
culture growth medium. In one aspect, the method uses a biocide and/or herbicide marker 
that has been introduced together with the desired nucleotide sequences. Plant 
regeneration from cultured protoplasts is described in Evans et al, Protoplasts Isolation 
and Culture, Handbook of Plant Cell Culture, pp. 124-176, MacMillilan Publishing 
Company, New York, 1983; and Binding, Regeneration of Plants, Plant Protoplasts, pp. 
21-73, CRC Press, Boca Raton, 1985. Regeneration can also be obtained from plant 
callus, explants, organs, or parts thereof. Such regeneration techniques are described 
generally in Klee (1987) Ann, Rev. of Plant Phys. 38:467-486. To obtain whole plants 
from transgenic tissues such as itnmature embryos, they can be grown under controlled 
environmental conditions in a series of media contaimng nutrients and hormones, a 
process known as tissue culture. Once whole plants are generated and produce seed, 
evaluation of the progeny begins. 

In one aspect, after the expression cassette is stably incorporated in transgenic 
plants, it can be introduced into other plants by sexual crossing. Any of a number of 
standard breeding techniques can be used, depending upon the species to be crossed. 
Since transgenic expression of the nucleic acids of the invention leads to phenotypic 
changes, plants comprising the recombinant nucleic acids of the invention can be sexually 
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crossed with a second plant to obtain a final product. Thus, the seed of die invention can 
be derived from a cross between two transgenic plants of the invention, or a cross 
between a plant of the invention and another plant. The desired effects (e.g., expression 
of the polypeptides of the invention to produce a plant in which flowering behavior is 

5 altered) can be enhanced when both parental plants express the polypeptides (e.g., a 
cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
enzyme) of the invention. The desired effects can be passed to future plant generations 
by standard propagation means. 

In one aspect, the nucleic acids and polypeptides of the invention are expressed m 

10 or inserted in any plant or seed. Transgenic plants of the invention can be dicotyledonous 
or monocotyledonous. Examples of monocot transgenic plants of the invention are 
grasses, such as meadow grass (blue grass, Pod), forage grass such as festuca, lolium, 
temperate grass, such di^Agrostis^ and cereals, e.g., wheat, oats, rye, barley, rice, 
sorghum, and maize (com). Examples of dicot transgenic plants of the invention are 

16 tobacco, legumes, such as lupins, potato, sugar beet, pea, bean and soybean, and 

cruciferous plants {imxily Brassicaceae), such as cauHflower, rape seed, and the closely 
related model oxgdimsmArabidopsis thaliana. Thus, the transgenic plants and seeds of 
the invention include a broad range of plants, including, but not limited to, species from 
the gGmm Anacardium, Arachis, Asparagus, Atropa, Avena, Brassica, Citrus, Citrullus, 

20 Capsicum, Carthamus, Cocos, Cojfea, Cucumis, Cucurbita, Daucus, Elaeis, Fragaria, 
Glycine, Gossypium, Helianthus, Heterocallis, Hordeum, Hyoscyamus, Lactuca, Limm, 
Lolium, Lupinus, Lycopersicon, Mains, Manihot, Majorana, Medicago, Nicotiana, Olea, 
Oryza, Fanieum, Pannisetum, Persea, Phaseolus, Pistachia, Fisum, Pyrus, Pnmus, 
Raphanus, Ricinus, Secale, Senecio, Sinapis, Solanum, Sorghum, Theobromus, 

25 Trigonella, Triticum, Vicia, Vitis, Vigfia, and Zea, 

In alternative embodiments, the nucleic acids of the invention are expressed in 
plants which contain fiber cells, including, e.g., cotton, silk cotton tree (Kapok, Ceiba 
pentandra), desert wiUow, creosote bush, winterfat, balsa, ramie, kenaf, hemp, roselle, 
jute, sisal abaca and flax. In alternative embodiments, the transgenic plants of the 

30 invention can be members of the genus Gossypium, including members of any Gossypium 
species, such as G. arboreum;. G, herbaceum, G. barbadeiise, and G, hirsutum. 

The invention also provides for transgenic plants to be used for producing large 
amounts of the polypeptides (e.g., a cellulase, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase enzyme or antibody) of the invention. For example, 
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see Palmgren (1997) Trends Genet. 13:348; Chong (1997) Transgenic Res. 6:289-296 
(producing human milk protein beta-casein in transgenic potato plants using an 
auxin-inducible, bidirectional mannopine synthase (masr,2') promoter with 
Agrobacteriiim tmnefaciens-mediated leaf disc transformation methods). 

Using known procedures, one of skill can screen for plants of the invention by 
detecting the increase or decrease of transgene mRNA or protein in transgenic plants. 
Means for detecting and quantitation of mRNAs or proteins are well known in the art. 

Polypeptides and peptides ' 

In one aspect, the invention provides isolated or recombinant polypeptides having 
a sequence identity (e.g., at least about 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 
58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 
73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 
88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more, or 
complete (100%) sequence identity, or homology) to an exemplary sequence of the 
invention, e.g., proteins having a sequence as set forth in SEQ ID NO:2, SEQ ID NO:4, 
SEQ ID N0:6, SEQ ID N0:8, SEQ ED NO:10, SEQ ID NO:12, SEQ ID NO:14, SEQ ID 
NO: 16, SEQ ID NO: 18, SEQ ID NO:20, SEQ ID NO:22, SEQ IDNO:24, SEQ ID 
NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, SEQ ID NO:34, SEQ ID 
NO:36, SEQ ID NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID 
NO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID 
NO:56, SEQ ID NO:58, SEQ ID NO:60, SEQ ID NO:62, SEQ ID NO:64, SEQ ID 
NO:66, SEQ ID NO:68, SEQ ID NO:70, SEQ ID NO:72, SEQ ID NO:74, SEQ ID 
NO:76, SEQ ID NO:78, SEQ ID NO:80, SEQ ID NO:82, SEQ ID NO:84, SEQ ID 
NO:86, SEQ ID NO:88, SEQ ID NO:90, SEQ ID NO:92, SEQ ID NO:94, SEQ ID 
NO:96, SEQ ID NO:98, SEQ ID NO: 100, SEQ ID NO: 102, SEQ ED NO: 104, SEQ ID 
NO: 106, SEQ ID NO: 108, SEQ ID NO: 1 10, SEQ ID NO: 112, SEQ ID NO:l 14, SEQ ID 
NO: 116, SEQ ID NO: 1 1 8, SEQ ID NO: 120, SEQ ID NO: 122, SEQ ID NO: 124, SEQ ID 
NO: 126, SEQ ID NO: 128, SEQ ID NO: 130, SEQ ID NO: 132, SEQ ID NO:134, SEQ ID 
NO: 136, SEQ ID NO: 13 8, SEQ ID NO: 140, SEQIDNO:142, SEQIDNO:143, SEQ ID 
NO: 146, SEQ ID NO:148, SEQ ID NO:150, SEQ ID NO:152, SEQ ID NO:154, SEQ ID 
NO: 156, SEQ ID NO: 158, SEQ ID NO: 160, SEQ ID NO:162, SEQ ID NO: 164 or SEQ 
ID NO: 166 (see also Tables 1, 2, and 3, Examples 1 and 4, below, and Sequence 
Listing)). The percent sequence identity can be over the full length of the polypeptide, or. 
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the identity can be; over a region of at least about 50, 60, 70, 80, 90, 100, 150, 200, 250, 
300, 350, 400, 450,500, 550, 600, 650, 700 or more residues. 

Polypeptides of the invention can also be shorter than the fall length of exemplary 
polypeptides. la alternative aspects, the invention provides polypeptides (peptides, 
fragments) ranging lb size between about 5 and the fall length of a polypeptide, e.g., an 
enzyme, such as a cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/'or 
beta-glucosidase enzyme; exemplary sizes being of about 5, 10, 15, 20, 25, 30, 35, 40, 45, 
50, 55, 60, 65, 70, 75, 80, 85, 90, 100, 125, 150, 175, 200, 250, 300, 350, 400, 450, 500, 
550, 600, 650, 70O, or more residues, e.g., contiguous residues of an exemplary cellulase, 
e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme of the 
invention. Peptides of the invention (e.g., a subsequence of an exemplary polypeptide of 
the invention) caabe usefal as, e.g., labeUng probes, antigens (inununogens), toleragens, 
motifs, cellulase, €.g., endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase enzyme active sites (e.g., "catalytic domains"), signal sequences and/or 
prepro domains. 

In alternative aspects, polypeptides of the invention having cellulase activity, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase activity are 
members of a geaus of polypeptides sharing specific structural elements, e.g., amino acid 
residues, that correlate with cellulase activity, e.g., endoglucanase, cellobiohydrolase, 
rnannanase and/oT beta-glucosidase activity. These shared structural elements can be 
used for the routijae generation of cellulase, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/oi beta-glucosidase variants. These shared stiuctural elements of 
cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
enzymes of the invention can be used as guidance for the routine generation of cellulase, 
e.g., endoglucanase, cellobiohydrolase, maimanase and/or beta-glucosidase enzymes 
variants within the scope of the genus of polypeptides of the invention. 

As used herein, the terms "cellulase, e.g., endoglucanase, cellobiohydrolase, 
mannanase and^c^r beta-glucosidase" encompass any polypeptide or enzymes capable of 
catalyzing the cojnplete or partial breakdown and/or hydrolysis of cellulose (e.g., 
exempleuy polypeptides of the invention, see also Tables 1, 2, and 3, Examples 1 and 4, 
below), or any m.odification of a cellulose or lignocellulotic material, e.g., a biomass 
material comprising lignocellulose. 

In some aspects, a polypeptide of the invention can have an alternative enzymatic 
activity, for example, as set forth in Table 3, below. For example, the polypeptide havmg 
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a sequence as set forth in SEQ ID NO: 164, encoded, e.g., by SEQ ID NO: 163, can have 
Alkaline endoglucanase/ceUulase activity; the polypeptide having a sequence as set forth 
in SEQ ID NO: 1 10, encoded, e.g., by SEQ ID NO: 109, can have xylanase activity; the 
polypeptide having a sequence as set forth in SEQ ID NO: 12, encoded, e.g., by SEQ ID 
NO:ll, can have NAD binding oxidoreductase activity; the polypeptide having a 
sequence as set forth in SEQ ID N0:1 18, encoded, e.g., by SEQ ID NO:l 17, can have 
short chain dehydrogenase activity; the polypeptide having a sequence as set forth in SEQ 
ID N0:14, encoded, e.g., by SEQ IDNO:13, can have NADH dependent dehydrogenase 
activity; the polypeptide having a sequence as set forth m SEQ ID NO: 138, encoded, e.g., 
by SEQ ID NO: 137, can have peptidase activity; the polypeptide havmg a sequence as set 
forth in SEQ ID NO: 162, encoded, e.g., by SEQ ID NO: 161, can have Alkaline 
endoglucanase activity, in addition to cellulase activity; the polypeptide having a 
sequence as set forth in SEQ ID NO:42, encoded, e.g., by SEQ ID N0:41, can have 
cysteinyl tRNA synthetase activity; the polypeptide having a sequence as set forth in SEQ 
ID NO:32, encoded, e.g., by SEQ ID NO:3 1 , can have cellodextrin phosphorylase 
activity; the polypeptide having a sequence as set forth in SEQ ID NO:50, encoded, e.g., 
by SEQ ID NO:49, can have fdhd/narq oxidoreductase activity; the polypeptide havmg a 
sequence as set forth in SEQ ID NO:54, encoded, e.g., by SEQ ID NO:53, can have a 
radical S-adenosylmethionine (SAM) activity; the polypeptide having a sequence as set 
forth in SEQ ID NO:58, encoded, e.g., by SEQ ID NO:57, can have a subtiHsin like 
protease activity; etc., as set forth below: 
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"Amino acid" or "amino acid sequence" as used herein refer to an oligopeptide, 
peptide, polypeptide, or protein sequence, or to a fragment, portion, or subunit of any of 
these and to naturally occurring or synthetic molecules. "Amino acid" or "amino acid 
sequence" include an oligopeptide, peptide, polypeptide, or protein sequence, or to a 
fragment, portion, or subxmit of any of these, and to naturally occurring or synthetic 
molecules. The term "polypeptide" as used herein, refers to amino acids joined to each 
other by peptide bonds or modified peptide bonds, i.e,, peptide isosteres and may contain 
modified amino acids other than the 20 gene-encoded amino acids. The polypeptides 
may be modified by either natural processes, such as post-translational processing, or by 
chemical modification techniques which are well known in the art. Modifications can 
occur anywhere in the polypeptide, including the peptide backbone, the amino acid side- 
chains and the amino or carboxyl termini. It will be appreciated that the same type of 
modification may be present in the same or varying degrees at several sites in a given 
polypeptide. Also a given polypeptide may have many types of modifications. 
Modifications include acetylation, acylation, ADP-ribosylation, amidation, covalent 
attachment of flavin, covalent attachment of a heme moiety, covalent attachment of a 
nucleotide or nucleotide derivative, covalent attachment of a lipid or lipid derivative, 
covalent attachment of a phosphatidylinositol, cross-linking cyclization, disulfide bond 
formation, demethylation, formation of covalent cross-links, formation of cysteine, 
formation of pyroglutamate, formylation, gamma-carboxylation, glycosylation, GPI 
anchor formation, hydroxylation, iodination, methylation, myristolyation, oxidation, 
pegylation, glucan hydrolase processing, phosphorylation, prenylation, racemization, 
selenoylation, sulfation and transfer-RNA mediated addition of amino acids to protem 
such as arginylation. (See Creighton, T.E., Proteins - Structure and Molecular Prpperties 
2nd Ed., W.H. Freeman and Company, New York (1993); Posttranslational Covalent 
Modification ofProteinSy B.C. Johnson, Ed., Academic Press, New York, pp. 1-12 
(1983)). The peptides and polypeptides of the invention also include all "mimetic" and 
"peptidomimetic" forms, as described in further detail, below. 

As used herein, the term "isolated" means that the material- (e.g., a protein or 
nucleic acid of the invention) is removed from its original environment (e.g., the natural 
enviroiunent if it is naturally occurring). For example, a naturally-occurring 
polynucleotide or polypeptide present in a living animal is not isolated, but the same 
polynucleotide or polypeptide, separated from some or all of the coexisting materials in 
the natuml system, is isolated. Such polynucleotides could be part of a vector and/or such 
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polynucleotides or polypeptides could be part of a composition and still be isolated in that 
such vector or composition is not part of its natural environment. As used herein, the term 
"purified" does not require absolute purity; rather, it is intended as a relative definition. 
Individual nucleic acids obtained fi-om a library have been conventionally purified to 
electrophoretic homogeneity. The sequences obtained from these clones could not be 
obtained dhectly either fi"om the library or from total human DNA. The purified nucleic 
acids of the invention have been purified from the remainder of the genomic DNA in the 
organism by at least lO'^-lO^ fold. In one aspect, the term "purified" includes nucleic acids 
which have been purified from the remainder of the genomic DNA or from other sequences 
in a library or other environment by at least one order of magnitude, e.g., in one aspect, two 
or three orders, or, four or five orders of magnitude. 

"Recombinant" polypeptides or proteins refer to polypeptides or proteins 
produced by recombinant DNA techniques; z.e., produced from cells transformed by an 
exogenous DNA construct encoding the desired polypeptide or protein. "Synthetic" 
polypeptides or protein are those prepared by chemical synthesis. Solid-phase chemical 
peptide synthesis methods can also be used to synthesize the polypeptide or fragments of 
the invention. Such method have been known in the art smce the early 1960*s (Menifield, 
R. B., J. Am. Chem. Soc, 85:2149-2154, 1963) (See also Stewart, J. M. and Young, J. D., 
Sohd Phase Peptide Synthesis, 2nd Ed., Pierce Chemical Co., Rockford, 111., pp. 11-12)) 
and have recently been employed in commercially available laboratory peptide design 
and synthesis kits (Cambridge Research Biochemicals). Such commercially available 
laboratory kits have generally utilized the teachings of H. M. Geysen et al, Proc, Natl 
Acad, Scu, USA, 81 :3998 (1984) and provide for synthesizing peptides upon the tips of a 
multitude of "rods" or "pins" all of which are connected to a single plate. 

The phrase "substantially identical" in the context of two nucleic acids or 
polypeptides, refers to two or more sequences that have, e.g., at least about 50%, 51%, 
52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 
67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 
82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 
97%, 98%, 99%, or more nucleotide or amino acid residue (sequence) identity, when 
compared and aligned for maximum correspondence, as measured using one of the 
known sequence comparison algorithms or by visual inspection. In alternative aspects, 
the substantial identity exists over a region of at least about 100 or more residues and 
most commonly the sequences are substantially identical over at least about 150 to 200 or 
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more residues. In some aspects, the sequences are substantially identical over the entire 
length of the coding regions. 

Additionally a "substantially identical" amino acid sequence is a sequence that 
differs from a reference sequence by one or more conservative or non-conservative amino 

5 acid substitutions, deletions, or insertions. In one aspect, the substitution occurs at a site 
that is not the active site of the molecule, or, alternatively the substitution occurs at a site 
that is the active site of the molecule, provided that the polypeptide essentially retains its 
functional (enzymatic) properties. A conservative amino acid substitution, for example, 
substitutes one amino acid for another of the same class (e.g. , substitution of one 

10 hydrophobic amino acid, such as isoleucine, valine, leucine, or methionine, for another, or 
substitution of one polar amino acid for another, such as substitution of arginine for 
lysine, glutamic acid for aspartic acid or glutamine for asparagine). One or more amino 
acids can be deleted, for example, from a cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase polypeptide, resulting in 

15 modification of the structure of the polypeptide, without significantly altering its 

biological activity. For example, amino- or carboxyl-terminal amino acids that are not 
required for cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase enzyme biological activity can be removed. Modified polypeptide sequences 
of the invention can be assayed for cellulase, e.g., endoglucanase, cellobiohydrolase, 

20 mannanase and/or beta-glucosidase enzyme biological activity by any number of 

methods, including contacting the modified polypeptide sequence with a substrate and 
determining whether the modified polypeptide decreases the amount of specific substrate 
in the assay or increases the bioproducts of the enzymatic reaction of a functional 
cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 

25 polypeptide with the substrate. 

"Fragments" as used herem are a portion of a naturally occxuxing protein which 
can exist in at least two different conformations. Fragments can have the same or 
substantially the same amino acid sequence as the naturally occurring protein. Fragments 
which have different three dimensional structures as the naturally occxirring protein are 

30 also included. An example of this, is a "pro-form" molecule, such as a low activity 
proprotein that can be modified by cleavage to produce a mature enzyme with 
significantly higher activity. 

In one aspect, the invention provides crystal (three-dimensional) structures of 
proteins and peptides, e.g., cellulases, of the invention; which can be made and analyzed 
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using the routine protocols well known in the art, e.g., as described in MacKenzie (1998) 
Crystal structure of the family 7 endoglucaaase I (CelVB) from Humicola insolens at 2.2 
A resolution and identification of the catalytic nucleophile by trapping of the covalent 
glycosyl-enzyme intermediate, Biochem. J. 335:409-416; Sakon (1997) Structure and 
mechanism of endo/exocellulase E4 from Therftiomonospora fusca, Nat. Struct. Biol 
4:810-818; Varrot (1999) Crystal structure of the catalytic core domam of the family 6 
cellobiohydrolase II, Cel6A, from Humicola insolens, at 1.92 A resolution, Biochem, J. 
337:297-304; illustrating and identifying specific structural elements as guidance for the 
routine generation of cellulase variants of the invention, and as guidance for identifying 
enzyme species within the scope of the invention. 

Polypeptides and peptides of the mvention can be isolated from natural sources, 
be synthetic, or be recombmantly generated polypeptides. Peptides and proteins can be 
recombinantly expressed in vitro or in vivo. The peptides and polypeptides of the 
invention can be made and isolated using any method known m the art. Polypeptide and 
peptides of the invention can also be synthesized, whole or in part, using chemical 
methods well known m the art. See e.g., Camthers (1980) Nucleic Acids Res. Symp. Ser. 
215-223; Horn (1980) Nucleic Acids Res. Symp. Ser. 225-232; Banga, A.K., Therapeutic 
Peptides and Proteins, Formulation, Processing and Delivery Systems (1995) Technomic 
PubUshing Co., Lancaster, PA. For example, peptide synthesis can be performed usmg 
various solid-phase techniques (see e.g., Roberge (1995) Science 269:202; Meixifield 
(1997) Methods EnzymoL 289:3-13) and automated synthesis may be achieved, e.g., 
usmg the ABI 431 A Peptide Synthesizer (Perkm Ebner) m accordance with the 
instructions provided by tlae manufacturer. 

The peptides and polypeptides of the invention can also be glycosylated. The 
glycosylation can be added post-translationally either chemically or by cellular 
biosyntiietic mechanisms, wherem the later incorporates the xxse of known glycosylation 
motifs, which can be native to the sequence or can be added as a peptide or added in the 
nucleic acid coding sequence. The glycosylation can be 0-linked or N-lmked. 

The peptides and polypeptides of the invention, as defined above, include all 
"mimetic" and ''peptidonumetic" forms. The terms "mimetic" and "peptidomimetic" 
refer to a synthetic chemical compound which has substantially the same structural and/or 
functional characteristics of the polypeptides of the invention. The mimetic can be either 
enturely composed of synthetic, non-natural analogues of amino acids, or, is a chimeric 
molecule of partly natural peptide ammo acids and partly non-natural analogs of amino 
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acids. The mimetic can also incorporate any amount of natural amino acid conservative 
substitutions as long as such substitutions also do not substantially alter the mimetic' s 
structure and/or activity. As with polypeptides of the invention wliich are conservative 
variants or members of a genus of polypeptides of the invention (e.g., having about 50% 

5 or more sequence identity to an exemplary sequence of the invention), routine 

experimentation will determine whether a mimetic is within the scope of the invention, 
i,e., that its structure and/or function is not substantially altered. Thus, in one aspect, a 
mimetic composition is within the scope of the invention if it has a cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzymes activity. 

10 Polypeptide mimetic compositions of the invention can contain any combination 

of non-natural structural components. In alternative aspect, mimetic compositions of the 
invention include one or all of the following three structural groups: a) residue linkage 
groups other than the natural amide bond ("peptide bond") linkages; b) non-natural 
residues in place of naturally occurring amino acid residues; or c) residues which induce 

15 secondary structural mimicry, i.e., to induce or stabilize a secondary structure, e.g., a beta 
turn, gamma turn, beta sheet, alpha helix confonnation, and the like. For example, a 
polypeptide of the invention can be characterized as a mimetic when all or some of its 
residues are joined by chemical means other than natural peptide bonds. Individual 
peptidomimetic residues can be joined by peptide bonds, other chemical bonds or 

20 coupling means, such as, e.g., glutaraldehyde, N-hydroxysuccinimide esters, bifunctional 
maleimides, N,N'-dicyclohexylcarbodiimide (DCC) orN,N'-diisopropylcaibodiimide 
(DIG). Linking groups that can be an alternative to the traditional amide bond ("peptide 
bond") linkages include, e.g., ketomethylene (e.g., -C(=0)-CH2- for -C(=0)-NH-), 
aminomethylene (CH2-NH), ethylene, olefm (CH=CH), ether (CH2-O), thioether (CH2-S), 

25 tetrazole (CN4-), thiazole, retroamide, thioamide, or ester (see, e.g., Spatola (1983) in 

Chemistry and Biochemistry of Amino Acids, Peptides and Proteins, Vol. 7, pp 267-357, 
"Peptide Backbone Modifications," Marcell Dekker, NY). 

A polypeptide of the invention can also be characterized as a mimetic by 
containing all or some non-natural residues in place of naturally occurring amino acid 

30 residues. Non-natural residues are well described in the scientific and patent literature; a 
few exemplary non-natural compositions useful as ncdmetics of natural am i n o acid 
residues and guidelines are described below. Mimetics of aromatic amino acids can be 
generated by replacing by, e.g., D- or L- naphylalanine; D- or L- phenylglycine; D- or L- 
2 thieneylalanine; D- or L-1, -2, 3-, or 4- pyreneylalanine; D- or L-3 thieneylalanine; D- 
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or L-(2-pyiidiiiyl)-alanine; D- or L-(3-pyridinyl)-alamne; D- or L-(2-pyrazmyl)-alaniiie; 
D- or L-(4-isopropyl)-phenylglycine; D-(trifluoromethyl)-phenylglycine; D- 
(trifluoroinethyl)-phenylalanine; D-p-fluoro-phenylalanine; D- or L-p- 
biphenylphenylalanine; D- or L-p-methoxy-biphenylphenylalanine; D- orL-2- 

5 indole(alkyl)alanmes; and, D- or L-alkylainines, where alkyl can be substituted or 

unsubstituted methyl, ethyl, propyl, hexyl, butyl, pentyl, isopropyl, iso-butyl, sec-isotyl, 
iso-pentyl, or a non-acidic amino acids. Aromatic rings of a non-natural amino acid 
include, e.g., thiazolyl, thiophenyl, pyrazolyl, benzimidazolyl, naphthyl, furanyl, pyrrolyl, 
and pyridyl aromatic rings. 

10 Mimetics of acidic amino acids can be generated by substitution by, e.g., non- 

carboxylate amino acids while maintaining a negative charge; (phosphono)alanine; 
sulfated threonine. Carboxyl side groups (e.g., aspartyl or glutamyl) can also be 
selectively modified by reaction with carbodiimides (R'-N-C-N-R') such as, e.g., 1- 
cyclohexyl-3(2-morpholinyl-(4-ethyl) carbodiimide or l-ethyl-3(4-azonia- 4,4- 

15 dimetholpentyl) carbodiimide. Aspartyl or glutamyl can also be converted to asparaginyl 
and glutaminyl residues by reaction with ammonium ions. Mimetics of basic amino acids 
can be generated by substitution with, e.g., (in addition to lysine and arginine) the amino 
acids ornithine, citruUine, or (guanidino)-acetic acid, or (guanidino)alkyl-acetic acid, 
where alkyl is defined above, Nitrile derivative (e.g., containing the CN-moiety in place 

20 of COOH) can be substituted for asparagine or glutamine. Asparaginyl and glutammyl 
residues can be deaminated to the corresponding aspartyl or glutamyl residues. Arginine 
residue mimetics can be generated by reacting arginyl with, e.g., one or more 
conventional reagents, including, e.g., phenylglyoxal, 2,3-butanedione, 1,2-cyclo- 
hexanedione, or ninhydrin, in one aspect irnder alkaline conditions. Tyrosine residue 

25 mimetics can be generated by reacting tyrosyl with, e.g., aromatic diazonium compounds 
or tetmnitromethane. N-acetylimidizol and tetranitromethane can be used to form O- 
acetyl tyrosyl species and 3-nitro derivatives, respectively. Cysteine residue mimetics 
can be generated by reacting cystemyl residues with, e.g., alpha-haloacetates such as 2- 
chloroacetic acid or chloroacetamide and corresponding amines; to give carboxymethyl or 

30 carboxyamidomethyl derivatives. Cysteine residue mimetics can also be generated by 
reacting cysteinyl residues with, e.g., bromo-trifluoroacetone, alpha-bromo-beta-(5- 
imidozoyl) propionic acid; chloroacetyl phosphate, N-alkylmaleimides, 3-nitro-2-pyridyl 
disulfide; methyl 2-pyridyl disulfide; p-chloromercuribenzoate; 2-chloromercuri-4 
nitrophenol; or, chloro-7-nitrobenzo-oxa-l,3-diazole. Lysine n^metics can be generated 
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(and amino tenninal residues can be altered) by reacting lysinyl with, e.g., succinic or 
other carboxyUc acid anhydrides. Lysine and other alpha-amino-containing residue 
mimetics can also be generated by reaction with imidoesters, such as methyl 
picolinimidate, pyridoxal phosphate, pyridoxal, chloroborohydride, trinitro- 
benzenesulfonic acid, 0-methylisourea, 2,4, pentanedione, and transamidase-catalyzed 
reactions with glyoxylate, Mimetics of methionine can be generated by reaction with, 
e.g., methionine sulfoxide. Mimetics of proline include, e.g., pipecolic acid, thiazolidine 
carboxylic acid, 3- or 4- hydroxy proline, dehydroproline, 3- or 4-methylproline, or 3,3,- 
dimethylproline. Histidine residue mimetics can be generated by reacting histidyl with, 
e.g., diethylprocarbonate or para-bromophenacyl bromide. Other mimetics include, e.g., 
those generated by hydroxylation of proline and lysine; phosphorylation of the hydroxyl 
groups of seryl or threonyl residues; methylation of the alpha-ammo groups of lysme, 
arginine and histidine; acetylation of the N-terminal amine; methylation of main chain 
amide residues or substitution with N-methyl ammo acids; or amidation of C-termmal 
carboxyl groups. 

In one aspect, a residue, e.g., an amino acid, of a polypeptide of the invention can 
also be replaced by an amino acid (or peptidomimetic residue) of the opposite chiraUty, 
In one aspect, any amino acid naturally occurring in the L-configuration (which can also 
be referred to as the R or S, depending upon the structure of the chemical entity) can be 
replaced with the amino acid of the same chemical structural type or a peptidomimetic, 
but of the opposite chirality, referred to as the D- amino acid, but also can be referred to 
as the R- or S- form. 

The invention also provides methods for modifymg the polypeptides of the 
invention by either natural processes, such as post-translational processing (e.g., 
phosphorylation, acylation, etc), or by chemical modification techniques, and the 
resulting modified polypeptides. Modifications can occur anywhere in the polypeptide, 
including the peptide backbone, the amino acid side-chains and the amino or carboxyl 
teimini. It will be appreciated that the same type of modification may be present m the 
same or varying degrees at several sites in a given polypeptide. Also a given polypeptide 
may have many types of modifications. In one aspect, modifications include acetylation, 
acylation, ADP-ribosylation, amidation, covalent attachment of flavin, covalent 
attachment of a heme moiety, covalent attachment of a nucleotide or nucleotide 
derivative, covalent attachment of a lipid or lipid derivative, covalent attachment of a 
phosphatidylinositol, cross-linking cyclization, disulfide bond formation, demethylation. 
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formation of covalent cross-links, formation of cysteine, formation of pyroglutamate, 
formylation, gatnma-carboxylation, glycosylation, GPI anchor formation, hydroxylation, 
iodination, methylation, myristoiyation, oxidation, pegylation, proteolytic processing, 
phosphorylation, prenylation, racemization, selenoylation, sulfation, and transfer-RNA 
mediated addition of amino acids to protein such as arginylation. See, e.g., Creighton, 
T.E., Proteins - Structure and Molecular Properties 2nd Ed., W.H. Freeman and 
Company, New York (1993); Posttranslational Covalent Modification of Proteins, B.C. 
Johnson, Ed., Academic Press, New York, pp. 1-12 (1983), 

Solid-phase chemical peptide synthesis methods can also be used to synthesize the 
polypeptide or fragments of the invention. Such method have been known in the art since 
the early 1960's (Merrifield, R. B., J. Am. Chem. Soc, 85:2149-2154, 1963) (See also 
Stewart, J. M. and Young, J. D., Solid Phase Peptide Synthesis, 2nd Ed., Pierce Chemical 
Co., Rockford, 111., pp. 1 1-12)) and have recently been employed in commercially 
available laboratory peptide design and synthesis kits (Cambridge Research 
Biochemicals). Such commercially available laboratory kits have generally utihzed the 
teachings of H. M. Geysen et al, Proc. Natl. Acad. Sci., USA, 81:3998 (1984) and provide 
for synthesizing peptides upon the tips of a multitude of "rods" or "pins" all of which are 
connected to a single plate. When such a system is utilized, a plate of rods or pins is 
inverted and inserted into a second plate of corresponding wells or reservoirs, which 
contain solutions for attaching or anchoring an appropriate amino acid to the pin's or rod's 
tips. By repeating such a process step, i.e., inverting and inserting the rod's and pin's tips 
into appropriate solutions, amino acids are built into desired peptides. In addition, a 
number of available FMOC peptide synthesis systems are available. For example, 
assembly of a polypeptide or fragment can be carried out on a soUd support using an 
Apphed Biosystems, Inc. Model 431A'rM automated peptide synthesizer. Such equipment 
provides ready access to the peptides of the invention, either by direct synthesis or by 
syntiiesis of a series of ftagments that can be coupled using other known techniques. 

The polypeptides of the invention include cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzymes in an active or inactive 
form. For example, die polypeptides of the invention include proproteins before 
"maturation" or processmg of prepro sequences, e.g., by a proprotein-processing enzyme, 
such as a proprotein convertase to generate an "active" mature protein. The polypeptides 
of the invention include ceUulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzymes inactive for other reasons, e.g., before "activation" by a 
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post-translational processing event, e.g., an endo- or exo-peptidase or proteinase action, a 
phosphorylation event, an amidation, a glycosylation or a sulfation, a dimerization event, 
and the like. The polypeptides of the invention include all active forms, including active 
subsequences, e.g., catalytic domains or active sites, of the enzyme. 

The invention includes immobilized cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzymes, anti-cellulase, e.g., anti- 
endoglucanase, anti-cellobiohydrolase and/or anti-beta-glucosidase antibodies and 
fragments thereof. The invention provides methods for inhibiting cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme activity, 
e.g., using dominant negative mutants or anti-cellulase, e.g., anti-endoglucanase, anti- 
cellobiohydrolase and/or anti-beta-glucosidase antibodies of the invention. The invention 
includes heterocomplexes, e.g,, fusion proteins, heterodimers, etc., comprismg the 
cellulase, e.g., endoglucanase, ceUobiohydrolase, mannanase and/or beta-glucosidase 
enzymes of the invention. 

Polypeptides of the invention can have a cellulase, e.g., endoglucanase, 
cellobiohydrolase, mamauase and/or beta-glucosidase enzyme activity under various 
conditions, e.g., extremes in pH and/or temperature, oxidizing agents, and the like. The 
invention provides methods leading to alternative cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme prepamtions with different 
catalytic efficiencies and stabilities, e.g., towards temperature, oxidizing agents and 
changmg wash conditions. In one aspect, cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme variants can be produced 
using techniques of site-directed mutagenesis and/or random mutagenesis. In one aspect, 
directed evolution can be used to produce a great variety of cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme variants with alternative 
specificities and stability. 

The proteins of the invention are also useful as research reagents to identify 
cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
enzyme modulators, e.g., activators or inhibitors of cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme activity. Briefly, test 
samples (compounds, broths, extracts, and the like) are added to cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme assays to 
determine their ability to inhibit substrate cleavage. Inhibitors identified in this way can 
be used in industry and research to reduce or prevent imdesired proteolysis. As with 
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cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
enzymes, inhibitors can be combined to increase the spectrum of activity. 

The enzymes of the invention are also useftil as research reagents to digest 
proteins or in protein sequencing. For example, the cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase en2ymes may be used to break 
polypeptides into smaller fragments for sequencing using, e.g. an automated sequencer. 

The invention also provides methods of discovering new cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzymes using the 
nucleic acids, polypeptides and antibodies of the invention. In one aspect, phagemid 
libraries are screened for expression-based discovery of cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzymes. In another aspect, 
lambda phage libraries are screened for expression-based discovery of cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzymes. 
Screening of the phage or phagemid libraries can allow the detection of toxic clones; 
improved access to substrate; reduced need for engineering a host, by-passing the 
potential for any bias resulting from mass excision of the library; and, faster growth at 
low clone densities. Screening of phage or phagemid libraries can be in liquid phase or in 
solid phase. In one aspect, the invention provides screening in liquid phase. This gives a 
greater flexibility in assay conditions; additional substrate flexibility; higher sensitivity 
for weak clones; and ease of automation over solid phase screening. 

The invention provides screening methods using the proteins and nucleic acids of 
the invention and robotic automation to enable the execution of many thousands of 
biocatalytic reactions and screening assays in a short period of time, e.g., per day, as well 
as ensuring a high level of accuracy and reproducibility (see discussion of arrays, below). 
As a result, a library of derivative compounds can be produced in a matter of weeks. For 
further teachings on modification of molecules, including small molecules, see 
PCT/US94/09174; U.S. Pat. No. 6,245,547. 

In one aspect, polypeptides or fragments of the invention are obtained tlirough 
biochemical enrichment or purification procedures. The sequence of potentially 
homologous polypeptides or fi^gments may be determined by cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme assays 
(see, e.g.. Examples 1, 2 and 3, below), gel electrophoresis and/or microsequencing. The 
sequence of the prospective polypeptide or fragment of the invention can be compared to 
an exemplary polypeptide of the invention, or a fragment, e.g., comprising at least about 
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5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 or more consecutive amino acids thereof 
using any of the programs described above. 

Another aspect of the invention is an assay for identifying fragments or variants of 
the invention, which retain the enzymatic function of the polypeptides of the invention. 
5 For example the fragments or variants of said polypeptides, may be used to catalyze 

biochemical reactions, which indicate that the fragment or variant retains the enzymatic 
activity of a polypeptide of the invention. An exemplary assay for determining if 
fragments of variants retahi the enzymatic activity of the polypeptides of the invention 
includes the steps of: contactmg the polypeptide Augment or variant with a substrate 
10 molecule under conditions which allow the polypeptide fragment or variant to fimction 
and detecting either a decrease ia the level of substrate or an increase in the level of the 
specific reaction product of the reaction between the polypeptide and substrate. 

The present invention exploits the unique catalytic properties of enzymes. 
Whereas the use of biocatalysts (i.e., purified or crude enzymes, non-living or living 
1 5 cells) in chemical transformations normally requires the identification of a particular 
biocatalyst that reacts with a specific starting compound, the present mvention uses 
selected biocatalysts and reaction conditions that are specific for frmctional groups that 
are present in many starting compounds, such as small molecules. Each biocatalyst is 
specific for one fimctional group, or several related frmctional groups and can react with 
20 many starting compounds containing this functional group. 

In one aspect, the biocatalytic reactions produce a population of derivatives from a 
single starting compound. These derivatives can be subjected to another round of 
biocatal3^ic reactions to produce a second population of derivative compounds. 
Thousands of variations of the original small molecule or compoxmd can be produced 
25 with each iteration of biocatalytic derivatization. 

Enzymes react at specific sites of a starting compound without affecting the rest of 
the molecule, a process which is very difficult to achieve using traditional chemical 
methods. This high degree of biocatalytic specificity provides the means to identify a 
single active compoimd within the library. The library is characterized by the series of 
30 biocatalytic reactions used to produce it, a so-called "biosynthetic history". Screening the 
'library for biological activities and tracing the biosynthetic history identifies the specific 
reaction sequence producing the active compound. The reaction sequence is repeated and 
the structure of the synthesized compound determined. This mode of identification, vmlike 
other synthesis and screening approaches, does not require immobilization technologies 
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and compounds can be synthesized and tested free in solution using virtually any type of 
screening assay. It is important to note, that the high degree of specificity of enzyme 
reactions on functional groups allows for the "tracking" of specific enzymatic reactions 
that make up the biocatalytically produced library. 

In one aspect, procedural steps are performed using robotic automation enabling 
the execution of many thousands of biocatalytic reactions and/or screening assays per day 
as well as ensuring a high level of accuracy and reproducibility. Robotic automation can 
also be used to screen for cellulase activity to determine if a polypeptide is within the 
scope of the invention. As a result, in one aspect, a library of derivative compounds can 
be produced in a matter of weeks which would take years to produce using "traditional" 
chemical or enzymatic screening methods. 

In a particular aspect, the invention provides a method for modifying small 
molecules, comprising contacting a polypeptide encoded by a polynucleotide described 
herein or enzymatically active fragments thereof with a small molecule to produce a 
modified small molecule. A library of modified small molecules is tested to determine if 
a modified small molecule is present within the library, which exhibits a desired activity. 
A specific biocatalytic reaction which produces the modified small molecule of desired 
activity is identified by systematically eliminating each of the biocatalytic reactions used 
to produce a portion of the library and then testmg the small molecules produced in the 
portion of the library for the presence or absence of the modified small molecule with the 
desired activity. The specific biocatalytic reactions which produce the modified small 
molecule of desired activity is optionally repeated. The biocatalytic reactions are 
conducted with a group of biocatalysts that react with distinct structural moieties found 
within the structure of a small molecule, each biocatalyst is specific for one structural 
moiety or a group of related structural moieties; and each biocatalyst reacts with many 
different small molecules which contain the distinct structural moiety. 

Cellulase, e.g., endoglucanase, cellobiohydrolase and/or beta-glucosidase enzyme 
signal sequences, prepro and catalytic domains 

The invention provides cellulase, e.g., endoglucanase, cellobiohydrolase, 

mannanase and/or beta-glucosidase enzyme signal sequences (e.g., signal peptides (SPs)), 

prepro domains and catalytic domams (CDs). The SPs, prepro domains and/or CDs of the 

invention can be isolated or recombinant peptides or can be part of a fusion protein, e.g., 

as a heterologous domain in a chimeric protein. The invention prx>vides nucleic acids 

encoding these catalytic domains (CDs), prepro domains and signal sequences (SPs, e.g., 
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a peptide having a sequence comprising/ consisting of amino teniiinal residues of a 
polypeptide of the invention). 

The invention provides isolated or recombinant signal sequences (e.g., signal 
peptides) consisting of or comprising a sequence as set forth in residues 1 to 14, 1 to 15, 1 
to 16, 1 to 17, 1 to 18, 1 to 19, 1 to 20, 1 to 21, 1 to 22, 1 to 23, 1 to 24, 1 to 25, 1 to 26, 1 
to 27, 1 to 28, 1 to 28, 1 to'SO, 1 to 31, 1 to 32, 1 to 33, 1 to 34, 1 to 35, 1 to 36, 1 to 37, 1 
to 38, 1 to 40, 1 to 41, 1 to 42, 1 to 43, 1 to 44, 1 to 45, 1 to 46, or 1 to 47, or more, of a 
polypeptide of the invention, e.g., exemplary polypeptides of the invention, see also Table 
3, Examples 1 and 4, below, and Sequence Listing. For example. Table 3, above, sets 
forth exemplary signal (leader) sequences of the invention, e.g., as in the polypeptide 
having a sequence as set forth m SEQ ID NO:164, encoded, e.g., by SEQ IDNO:163, has 
a signal sequence comprising (or consisting of) the amino terminal 30 residues, or, 
MSCRTLMSRRVGWGLLLWGGLFLRTGSVTG. Additional signal sequences are similarly set 
forth in Table 3. 

In one aspect, the mvention provides signal sequences comprising the first 14, 15, 
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 
40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 
64, 65, 66, 67, 68, 69, 70 or more amino terminal residues of a polypeptide of the 
invention. 

The invention includes polypeptides with or without a signal sequence and/or a 
prepro sequence. The invention includes polypeptides with heterologous signal 
sequences and/or prepro sequences. The prepro sequence (including a sequence of the 
invention used as a heterologous prepro domain) can be located on the amino terminal or 
the carboxy terminal end of the protein. The invention also includes isolated or 
recombinant signal sequences, prepro sequences and catalytic domains (e.g., "active 
sites") comprising sequences of the invention. The polypeptide comprising a signal 
sequence of the invention can be a cellulase, e,g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase enzyme of the invention or another cellulase, e.g., 
endoglucanase, cellobiohydrolase, maimanase and/or beta-glucosidase en2yme or another 
enzyme or other polypeptide. Methods for identifying "prepro" domain sequences and 
signal sequences are well known in the art, see, e.g., Van de Yen (1993) Crit. Rev. Oncog. 
4(2): 115-136. For example, to identify a prepro sequence, the protein is purified from the 
extracellular space and the N-terminal protem sequence is determined and compared to 
the unprocessed form. 
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The cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase enzyme signal sequences (SPs) and/or prepro sequences of the invention can 
be isolated or recombinant peptides, or, sequences joined to another cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme or a non- 
cellulase, e.g., non-endoglucanase, non-cellobiohydrolase and/or non-beta-glucosidase 
polypeptide, e.g., as a fusion (chimeric) protein. In one aspect, the invention provides 
polypeptides comprising cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme signal sequences of the invention. In one aspect, 
polypeptides comprising cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme signal sequences SPs and/or prepro of the invention 
comprise sequences heterologous to a cellulase, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase enzyme of the invention (e.g., a fusion protein 
comprising an SP and/or prepro of the invention and sequences from another cellulase, 
e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme or a 
non-cellixlase, e.g., non-endoglucanase, non-cellobiohydrolase and/or non-beta- 
glucosidase proteui). In one aspect, the invention provides cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzymes of the invention with 
heterologotis SPs and/or prepro sequences, e.g., sequences with a yeast signal sequence. 
A cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
enzyme of the invention can comprise a heterologous SP and/or prepro in a vector, e.g., a 
pPIC series vector (Invitrogen, Carlsbad, CA). 

In one aspect, SPs and/or prepro sequences of the invention are identified 
following identification of novel cellulase, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase polypeptides. The pathways by which proteins are 
sorted and transported to their proper cellular location are often referred to as protein 
targeting pathways. One of the most important elements in all of these targeting systems 
is a short amino acid sequence at the amino terminus of a newly synthesized polypeptide 
called the signal sequence. This signal sequence directs a protein to its appropriate 
location in the cell and is removed during transport or when the protein reaches its fmal 
destination. Most lysosomal, membrane, or secreted proteins have an amino-terminal 
signal sequence that marks them for translocation into the lumen of the endoplasmic 
reticulum. The signal sequences can vary in length from about 10 to 65, or more, amino 
acid residues. Various methods of recognition of signal sequences are known to those of 
skill in the art. For example, in one aspect, novel cellulase, e.g., endoglucanase. 
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cellobiohydrolase, maimanase and/or beta-glucosidase enzyme signal peptides are 
identified by a method refenred to as Signal?. SignalP uses a combined neural network 
which recognizes both signal peptides and their cleavage sites. (Nielsen (1997) 
"Identification of prokaryotic and eukaryotic signal peptides and prediction of their 
cleavage sites." Protein Engineering 10:1-6. 

In some aspects cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzymes of the invention do not have SPs and/or prepro 
sequences or "domauis." In one aspect, the invention provides the cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzymes of the 
invention lacking all or part of an SP and/or a prepro domain. In one aspect, the 
invention provides a nucleic acid sequence encodmg a signal sequence (SP) and/or prepro 
from one cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase enzyme operably linked to a nucleic acid sequence of a different cellulase, 
e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme or, 
optionally, a signal sequence (SPs) and/or prepro domain from a non-cellulase, e.g., non- 
endoglucanase, non-cellobiohydrolase and/or non-beta-glucosidase protein may be 
desired. 

The invention also provides isolated or recombinant polypeptides comprising 
signal sequences (SPs), prepro domain and/or catalytic domains (CDs) of the invention 
and heterologous sequences. The heterologous sequences are sequences not natuiBlly 
associated (e.g., to a enzyme) with an SP, prepro domain and/or CD. The sequence to 
which the SP, prepro domain and/or CD are not naturally associated can be on the SP's, 
prepro domain and/or CD's amino terminal end, carboxy terminal end, and/or on both 
ends of the SP and/or CD. In one aspect, the invention provides an isolated or 
recombinant polypeptide comprising (or consisting of) a polypeptide comprising a signal 
sequence (SP), prepro domain and/or catalytic domaia (CD) of the invention with the 
proviso that it is not associated with any sequence to which it is naturally associated (e.g., 
a cellulase, e.g., endoglucanase, cellobiohydrolase, maimanase and/or beta-glucosidase 
enzyme sequence). Similarly in one aspect, the invention provides isolated or 
recombinant nucleic acids encoding these polypeptides. Thus, in one aspect, the isolated 
or recombinant nucleic acid of the invention comprises coding sequence for a signal 
sequence (SP), prepro domaia and/or catalytic domain (CD) of the invention and a 
heterologous sequence (i.e., a sequence not naturally associated with the a signal 
sequence (SP), prepro domain and/or catalytic domain (CD) of the invention). The 
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heterologous sequence can be on the 3' tenninal end, 5' tenninal end, and/or on both ends 
of the SP, prepro domam and/or CD coding sequence. 

Hybrid (chimeric) cellulose, e.g., endoglucanase, cellobiohydrolase and/or beta- 
glucosidase enzymes and peptide libraries 

5 In one aspect, the invention provides hybrid cellulase, e.g., endoglucanase, 

cellobiohydrolase, mannanase and/or beta-glucosidase enzymes and fusion proteins, 
including peptide libraries, comprising sequences of the invention. The peptide libraries 
of the invention can be used to isolate peptide modulators (e.g., activators or inhibitors) of 
targets, such as cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta- 

10 glucosidase enzyme substrates, receptors, enzymes. The peptide libraries of the mvention 
can be used to identify formal binding partners of targets, such as ligands, e.g., cytokmes, 
hormones and the like. In one aspect, the invention provides chimeric proteins 
comprising a signal sequence (SP), prepro domain and/or catalytic domain (CD) of the 
invention or a combination thereof and a heterologous sequence (see above). 

15 In one aspect, the fusion proteins of the invention (e.g., the peptide moiety) are 

conformationally stabilized (relative to Imear peptides) to allow a higher bmdmg affinity 
for targets. The invention provides fusions of cellulase, e.g., endoglucanase, 
cellobiohydrolase, mamanase and/or beta-glucosidase enzymes of the invention and other 
peptides, including known and random peptides. They can be fused in such a manner that 

20 the structure of the cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or 
beta-glucosidase enzymes is not significantly perturbed and the peptide is metabolically 
or structurally conformationally stabilized. This allows the creation of a peptide Ubrary 
that is easily monitored both for its presence within cells and its quantity. 

Amino acid sequence variants of the invention can be characterized by a 

25 predetermined nature of the variation, a feature that sets them apart firom a naturally 

occurring form, e.g., an allelic or interspecies variation of a cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme sequence. In one aspect, 
the variants of the invention exhibit the same quaUtative biological activity as the 
naturally occurring analogue. Alternatively, the variants can be selected for having 

30 modified characteristics. In one aspect, while the site or region for introducing an amino 
acid sequence variation is predetermined, the mutation per se need not be predetermined. 
For example, in order to optimize the performance of a mutation at a given site, random 
mutagenesis may be conducted at the target codon or region and the expressed cellulase, 
e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme 
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variants screened for the optimal combination of desired activity. Techniques for making 
substitution mutations at predetermined sites in DNA having a known sequence are well 
known, as discussed herein for example, Ml 3 primer mutagenesis and PGR mutagenesis. 
Screening of the mutants can be done using, e.g., assays of glucan hydrolysis. In 
alternative aspects, amino acid substitutions can be single residues; insertions can be on 
the order of from about 1 to 20 ammo acids, although considerably larger insertions can 
be done. Deletions can range from about 1 to about 20, 30, 40, 50, 60, 70 residues or 
more. To obtain a final derivative with the optimal properties, substitutions, deletions, 
insertions or any combination thereof may be used. Generally, these changes are done on 
a few amino acids to m ini m i z e the alteration of the molecule. However, larger changes 
may be tolerated in certain circiunstances. 

The mvention provides cellulase, e.g., endoglucanase, cellobiohydrolase, 
mamiauase and/or beta-glucosidase enzymes where the structure of the polypeptide 
backbone, the secondary or the tertiary structure, e.g., an alpha-helical or beta-sheet 
structure, has been modified. In one aspect, the charge or hydrophobicity has been 
modified. In one aspect, the bulk of a side chain has been modified. Substantial changes 
in function or immunological identity are made by selecting substitutions that are less 
conservative. For example, substitutions can be made which more significantly affect: 
the structure of the polypeptide backbone in the area of the alteration, for example a 
alpha-helical or a beta-sheet structure; a charge or a hydrophobic site of the molecule, 
which can be at an active site; or a side cham. The invention provides substitutions in 
polypeptide of the mvention where (a) a hydrophilic residues, e.g. seryl or threonyl, is 
substituted for (or by) a hydrophobic residue, e.g. leucyl, isoleucyl, phenylalanyl, valyl or 
alanyl; (b) a cysteme or prolme is substituted for (or by) any other residue; (c) a residue 
having an electropositive side chain, e.g. lysyl, arginyl, orhistidyl, is substituted for (or 
by) an electronegative residue, e.g. glutamyl or aspartyl; or (d) a residue having a bulky 
side chain, e.g. phenylalanine, is substituted for (or by) one not havmg a side chain, e.g. 
glycine. The variants can exhibit the same qualitative biological activity (i.e., a cellulase, 
e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme 
activity) although variants can be selected to modify the characteristics of the cellulase, 
e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzymes as 
needed. 

In one aspect, cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or 
beta-glucosidase enzymes of the invention comprise epitopes or purification tags, signal 
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sequences or other fusion sequences, etc. In one aspect, the cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzymes of the 
invention can be fused to a random peptide to form a fusion polypeptide. By "fused" or 
"operably linked" herein is meant that the random peptide and the cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme are linked 
together, m such a manner as to minimize the disruption to the stability of the cellulase, 
e,g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme 
structure, e.g., it retains cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme activity. The fusion polypeptide (or fusion 
polynucleotide encoding the fusion polypeptide) can comprise further components as 
well, including multiple peptides at multiple loops. 

In one aspect, the peptides and nucleic acids encoding them are randomized, either 
fully randomized or they are biased in their randomization, e.g. in nucleotide/residue . 
frequency generally or per position. "Randomized" means that each nucleic acid and 
peptide consists of essentially random nucleotides and amino acids, respectively. In one 
aspect, the nucleic acids which give rise to the peptides can be chemically synthesized, 
and thus may incorporate any nucleotide at any position. Thus, when the nucleic acids are 
expressed to form peptides, any amino acid residue may be incorporated at any position. 
The synthetic process can be designed to generate randomized nucleic acids, to allow the 
formation of all or most of the possible combinations over the length of the nucleic acid, 
thus forming a library of randomized nucleic acids. The library can provide a sufiBciently 
structurally diverse population of randomized expression products to affect a 
probabiUstically sufficient range of cellular responses to provide one or more cells 
exhibiting a desired response. Thus, the invention provides an interaction library large 
enough so that at least one of its members will have a structure that gives it affinity for 
some molecule, protein, or other factor. 

In one aspect, a cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme of the invention is a muitidomain enzyme that comprises 
a signal peptide, a carbohydrate binding module, a cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme catalytic domain, a linlcer 
and/or another catalytic domain. 

The invention provides a methods and sequences for generating chimeric 
polypeptides which may encode biologically active hybrid polypeptides (e.g., hybrid 
cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
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enzymes). In one aspect, the original polynucleotides (e.g., an exemplary nucleic acid of 
the invention) encode biologically active polypeptides. In one aspect, a method of the 
invention produces new hybrid polypeptides by utilizing cellular processes which 
integrate the sequence of the original polynucleotides such that the resulting hybrid 
polynucleotide encodes a polypeptide demonstrating activities derived, but different, from 
the original biologically active polypeptides (e.g., cellulase or antibody of the invention). 
For example, the original polynucleotides may encode a particular enzyme (e.g., 
cellulase) fix)m or foimd in different microorganisms. An enzyme encoded by a first 
polynucleotide from one orgamsm or variant may, for example, function effectively under 
a particular environmental condition, e.g, high salinity. An enzyme encoded by a second 
polynucleotide from a different organism or variant may function effectively under a 
different environmental condition, such as extremely high temperatures. A hybrid 
polynucleotide containing sequences from the first and second original polynucleotides 
may encode an enzyme which exhibits characteristics of both enzymes encoded by the 
original polynucleotides. Thus, the enzyme encoded by the hybrid polynucleotide of the 
invention may function effectively under environmental conditions shared by each of the 
enzymes encoded by the first and second polynucleotides, e.g., high salinity and extreme 
temperatures. 

In one aspect, a hybrid polypeptide generated by a method of the invention may 
exhibit specialized enzyme activity not displayed in the original enzymes. For example, 
following recombination and/or reductive reassortment of polynucleotides encoding 
cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
enzymes, the resulting hybrid polypeptide encoded by a hybrid polynucleotide can be 
screened for specialized non-ceUulase, e.g., non-endoglucanase, non-cellobiohydrolase 
and/or non-beta-glucosidase enzyme activities, e.g., hydrolase, peptidase, phosphorylase, 
etc., activities, obtained from each of the original enzymes, In one aspect, the hybrid 
polypeptide is screened to ascertain those chemical functionalities which distinguish the 
hybrid polypeptide from the original parent polypeptides, such as the temperature, pH or 
salt concentration at which the hybrid polypeptide functions. 

In one aspect, the invention relates to a method for producing a biologically active 
hybrid polypeptide and screening such a polypeptide for enhanced activity by: 

1) introducing at least a first polynucleotide in operable linkage and a second 
polynucleotide in operable linkage, the at least first polynucleotide and 
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second polynucleotide sharing at least one region of partial sequence 
homology, into a suitable host cell; 
2) growing the host cell under conditions which promote sequence 

reorganization resulting in a hybrid polynucleotide m operable linkage; 
5 3) expressing a hybrid polypeptide encoded by the hybrid polynucleotide; 

4) screening the hybrid polypeptide imder conditions which promote 
identification of enhanced biological activity; and 

5) isolating the a polynucleotide encoding the hybrid polypeptide. 

Isolating and discovering cellulase enzymes 

10 The invention provides methods for isolating and discovering cellulases, e.g., 

endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzymes and the 
nucleic acids that encode them. Polynucleotides or enzymes may be isolated from 
individual organisms C*isolates"), collections of organisms that have been grown in 
defined media ("enrichment cultures"), or, uncultivated organisms ("environmental 

15 samples"). The organisms can be isolated by, e.g., in vivo biopanning (see discussion, 
below). The use of a culture-independent approach to derive polynucleotides encoding 
novel bioactivities from environmental samples is most preferable since it allows one to 
access untapped resources of biodiversity. Polynucleotides or enzymes also can be 
isolated from any one of numerous organisms, e.g. bacteria. In addition to whole cells, 

20 polynucleotides or enzymes also can be isolated from crude enzyme preparations derived 
from cultures of these organisms, e.g., bacteria. 

"Environmental libraries" are generated from environmental samples and 
represent the collective genomes of naturally occulting organisms archived in cloning 
vectors that can be propagated in suitable prokaryotic hosts. Because the cloned DNA is 

25 initially extracted directly from environmental samples, the Ubraries are not limited to the 
small fraction of prokaryotes that can be grown in pure culture. Additionally, a 
normalization of the environmental DNA present in these samples could allow more 
equal representation of the DNA from all of the species present in the original sample. 
This can dramatically increase the efficiency of finding interesting genes from minor 

30 constituents of the sample which may be under-represented by several orders of 
magnitude compared to the dominant species. 

In one aspect, gene libraries generated from one or more imcultivated 
microorganisms are screened for an activity of interest. Potential pathways encoding 
bioactive molecules of interest are first captured in prokaryotic cells in the form of gene 
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expression libraries. In one aspect, polynucleotides encoding activities of interest are 
isolated from such libraries and introduced into a host cell. The host cell is grown under 
conditions which promote recombination and/or reductive reassortment creating 
potentially active biomolecules with novel or enhanced activities. 

In VIVO biopanning may be performed utilizing a FACS-based and non-optical 
(e.g., magnetic) based machines, hi one aspect, complex gene Ubraries are constructed 
with vectors which contain elements which stabilize transcribed RNA. For example, the 
inclusion of sequences which result in secondary structures such as hairpins which are 
designed to flank the transcribed regions of the RNA would serve to enhance their 
stabiHty, thus increasing their half Ufe withm the cell. The probe molecules used in the 
biopanning process consist of oligonucleotides labeled with reporter molecules that only 
fluoresce upon binding of the probe to a target molecule. These probes are introduced 
into the recombinant cells from the hbrary using one of several transformation methods. 
The probe molecules bind to the transcribed target mRNA resulting in DNA/RNA 
heteroduplex molecules. Binding of the probe to a target will yield a fluorescent signal 
which is detected and sorted by the FACS machine during the screening process. 

In one aspect, subcloning is performed to further isolate sequences of mterest. In 
subcloning, a portion of DNA is ampUfied, digested, generally by restriction enzymes, to 
cut out the desired sequence, the desired sequence is ligated into a recipient vector and is 
amplified. At each step in subcloning, the portion is examined for the activity of interest, 
in order to ensure that DNA that encodes the structural protein has not been excluded. 
The insert may be purified at any step of the subcloning, for example, by gel 
electrophoresis prior to ligation into a vector or where cells containing the recipient 
vector and cells not containing the recipient vector are placed on selective media 
containing, for example, an antibiotic, which will kill the cells not contakdng the recipient 
vector. Specific methods of subcloning cDNA inserts into vectors are well-known in the 
art (Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd Ed^ Cold Spring 
Haibor Laboratory Press (1989)). In another aspect, the enzymes of the invention are 
subclones. Such subclones may differ from the parent clone by, for example, length, a 
mutation, a tag or a label. 

The microorganisms from which the polynucleotide may be discovered, isolated 
or prepared include prokaryotic microorganisms, such as Eubacteria wdArchaebacteria 
and lower eukaryotic microorganisms such as fungi, some algae and protozoa. 
Polynucleotides may be discovered, isolated or prepared from environmental samples in 
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which case the nucleic acid may be recovered without culturing of an organism or 
recovered from one or more cultured organisms. In one aspect, such microorganisms 
maybe extremopliiles, such as hyperthermophiles, psychrophiles, psychrotrophs, 
halophiles, barophiles and acidophils. Polynucleotides encoding enzymes isolated from 

5 extremophilic microorganisms can be used. En2ymes of this invention can function at 
temperatures above 100°C, e.g., as those found in terrestrial hot springs and deep sea 
thermal vents, or at temperatures below 0**C, e.g., as those found in arctic waters, in a 
saturated salt environment, e.g., as those found in the Dead Sea, at pH values around 0, 
e.g., as those found in coal deposits and geothermal sulfur-rich springs, or at pH values 

10 greater than 11, e.g., as those found in sewage sludge. In one aspect, enzymes of the 
invention have high activity throughout a wide range of temperatures and pHs. 

Polynucleotides selected and isolated as heremabove described are introduced into 
a suitable host cell. A suitable host cell is any cell which is capable of promoting 
recombination and/or reductive reassortment. The selected polynucleotides are in one 

15 aspect already in a vector which includes appropriate control sequences. The host cell 

can be a higher eukaryotic cell, such as a mammalian cell, or a lower eukaryotic cell, such 
as a yeast cell, or in one aspect, the host cell can be a prokaryotic cell, such as a bacterial 
cell. Introduction of the construct into the host cell can be effected by calcium phosphate 
transfection, DEAE-Dextran mediated transfection, or electroporation. 

20 Exemplary hosts ictclude bacterial cells, such as E, coli, Streptomyces, Salmonella 

typhimurium; fungal ceUs, such as yeast; insect cells such as Drosophila S2 and 
Spodoptera Sf9; animal cells such as CHO, COS or Bowes melanoma; adenoviruses; and 
plant cells; see discussion, above. The selection of an appropriate host is deemed to be 
within the scope of those skilled in the art from the teachings herein. 

25 Various mammaUan cell culture systems can be employed to express recombinant 

protein; examples of mammalian expression systems include the COS-7 lines of monkey 
kidney fibroblasts, described in "SV40-transformed simian cells support die replication of 
early SV40 mutants" (Gluzraan, 1981) and other cell lines capable of expressing a 
compatible vector, for example, the C127, 3T3, CHO, HeLa and BHK cell lines. 

30 Manunahan expression vectors can comprise an origin of replication, a suitable promoter 
and enhancer and also any necessary ribosome binding sites, polyadenylation site, splice 
donor and acceptor sites, transcriptional termination sequences and 5' flanking 
nontranscribed sequences, DNA sequences derived from the SV40 splice and 
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polyadenylation sites may be used to provide the required nontranscribed genetic 
elements. 

In another aspect, nucleic acids, polypeptides and methods of the invention are 
xised in biochemical pathways, or to generate novel polynucleotides encoding 
biochemical pathways from one or more operons or gene clusters or portions thereof. For 
example, bacteria and many eukaryotes have a coordinated mechanism for regulating 
genes whose products are involved in related processes. The genes are clustered, in 
structures referred to as "gene clusters," on a single chromosome and are transcribed 
together imder the control of a smgle regulatory sequence, including a single promoter 
which initiates transcription of the entire cluster. Thus, a gene cluster is a group of 
adjacent genes that are either identical or related, usually as to their function (an example 
of a biochemical pathway encoded by gene clusters are polyketides). 

In one aspect, gene cluster DNA is isolated from different organisms and ligated 
into vectors, e.g., vectors contaming expression regulatory sequences which can control 
and regulate the production of a detectable protein or protein-related array activity from 
the ligated gene clusters. Use of vectors which have an exceptionally large capacity for 
exogenous DNA introduction can be appropriate for use with such gene clusters and are 
described by way of example herein to include the f-factor (or fertility factor) ofE, colt 
This f-factor of coli is a plasmid which affects high-frequency transfer of itself during 
conjugation and is ideal to achieve and stably propagate large DNA fragments, such as 
gene clusters from mixed microbial samples. One aspect is to use cloning vectors, 
referred to as "fosmids" or bacterial artificial chromosome (BAG) vectors. These are 
derived from E. coli f-factor which is able to stably integrate large segments of genomic 
DNA. When integrated with DNA from a mixed uncultured environmental sample, this 
makes it possible to achieve large genomic fragments in the form of a stable 
"environmental DNA library." Another type of vector for use in the present invention is a 
cosmid vector. Cosmid vectors were originally designed to clone and propagate large 
segments of genomic DNA. Cloning into cosmid vectors is described in detail in 
Sambrook et al. Molecular Cloning: A Laboratory Manual, 2nd Ed., Cold Spring Harbor 
Laboratory Press (1989). Once ligated into an appropriate vector, two or more vectors 
containing different polyketide synthase gene clusters can be introduced into a suitable 
host cell. Regions of partial sequence homology shared by the gene clxasters will promote 
processes which result in sequence reorganization resulting in a hybrid gene cluster. The 
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novel hybrid gene cluster can then be screened for enhanced activities not found in the 
original gene clusters. 

Methods for screening for various enzyme activities are known to those of skill in 
the art and are discussed throughout the present specification, see, e.g.. Examples 1, 2 and 
3, below. Such methods may be employed when isolating the polypeptides and 
polynucleotides of the invention. 

In one aspect, the invention provides methods for discovering and isolating 
cellulases, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase, or 
compounds to modify the activity of these enzymes, using a whole cell approach (see 
discussion, below). Putative clones encoding cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase from genomic DNA library can be 
screened. 

Screening Methodologies and "On-line" Monitoring Devices 

In practicing the methods of the invention, a variety of apparatus and 
methodologies can be used to in conjunction with the polypeptides and nucleic acids of 
the invention, e.g., to screen polypeptides for cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase enzyme activity, to screen 
compounds as potential modulators, e.g., activators or inhibitors, of a cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme activity, 
for antibodies that bind to a polypeptide of the invention, for nucleic acids that hybridize 
to a nucleic acid of the invention, to screen for cells expressing a polypeptide of the 
invention and the like. In addition to the array formats described in detail below for 
screening samples, altemative formats can also be used to practice the methods of the 
invention. Such fonnats include, for example, mass spectrometers, chromatographs, e.g., 
high-throughput HPLC and other forms of liquid chromatography, and smaller fonnats, 
such as 1536-well plates, 384-well plates and so on. High throughput screening 
apparatus can be adapted and used to pmctice the methods of the invention, see, e.g., U.S. 
Patent AppHcation Nos. 20020001809; 20050272044. 

Capillary Arrays 

Nucleic acids or polypeptides of the invention can be inunobilized to or applied to 
an array. Arrays can be used to screen for or monitor libraries of compositions (e.g., 
small molecules, antibodies, nucleic acids, etc.) for their abiUty to bind to or modulate the 
activity of a nucleic acid or a polypeptide of the invention. Capillary arrays, such as the 
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GIGAMATRDCrM, Diversa Corporation, San Diego, CA; and arrays described in, e.g., 
U.S. Patent Application No, 20020080350 Al; WO 0231203 A; WO 0244336 A, provide 
an alternative apparatus for holding and screening samples. In one aspect, the capillary 
array includes a plurality of capillaries formed uito an array of adjacent capillaries, 
wherein each capillary comprises at least one wall defining a lumen for retaining a 
sample. The lirnien may be cylindrical, square, hexagonal or any other geometric shape 
so long as the walls form a lumen for retention of a liquid or sample. The capillaries of 
the capillary array can be held together in close proximity to form a planar structure. The 
capillaries can be bound together, by being fused (e.g., where the capillaries are made of 
glass), glued, bonded, or clamped side-by-side. Additionally, the capillary array can 
include interstitial material disposed between adjacent capillaries in the array, thereby 
formmg a solid planar device containing a plurality of through-holes. 

A capillary airay can be formed of any number of individual capillaries, for 
example, a range firom 100 to 4,000,000 capillaries. Further, a capillary array having 
about 100,000 or more individual capillaries can be formed into the standard size and 
shape of a Microtiter® plate for fitment into standard laboratory equipment The lumens 
are filled manually or automatically using either capillary action or microinjection using a 
thin needle. Samples of interest may subsequently be removed fi-om individual capillaries 
for further analysis or characterization. For example, a thin, needle-Uke probe is 
positioned in fluid conununication with a selected capillary to either add or withdraw 
material fi-om the lumen. 

In a single-pot screening assay, the assay components are mixed yielding a 
solution of interest, prior to insertion into the capillary array. The lumen is filled by 
capillary action when at least a portion of the array is immersed into a solution of interest. 
Chemical or biological reactions and/or activity in each capillary are monitored for 
detectable events. A detectable event is often referred to as a "hit", which can usually be 
distinguished firom "non-hit" producuig capillaries by optical detection. Thus, capillary 
arrays allow for massively parallel detection of "hits". 

In a multi-pot screening assay, a polypeptide or nucleic acid, e.g., a ligand, can be 
introduced into a first component, which is mtroduced into at least a portion of a capillary 
of a capillary array. An air bubble can then be introduced into the capillary behind the 
first component. A second component can then be introduced into the capillary, wherein 
the second component is separated fi-om the first component by the air bubble. The first 
and second components can then be mixed by applying hydrostatic pressure to both sides 
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of the capillary array to collapse the bubble. The capillary array is then monitored for a 
detectable event resxdting from reaction or non-reaction of the two components. 

In a binding screening assay, a sample of interest can be introduced as a first 
hquid labeled with a detectable particle into a capillary of a capillary array, wherein the 
lumen of the capillary is coated with a binding material for binding the detectable particle 
to the lumen. The first liquid may then be removed from the capillary tube, wherein the 
bound detectable particle is maintained within the capillary, and a second hquid may be 
introduced into the capillary tube. The capillary is then monitored for a detectable event 
resulting from reaction or non-reaction of the particle with the second liquid. 

Arrays, or "Biochips** 

Nucleic acids or polypeptides of the invention can be immobilized to or applied to 
an array. Arrays can be used to screen for or monitor libraries of compositions (e.g., 
small molecules, antibodies, nucleic acids, etc.) for their abiUty to bmd to or modulate the 
activity of a nucleic acid or a polypeptide of the invention. For example, in one aspect of 
the invention, a monitored parameter is transcript expression of a cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme gene. One 
or more, or, all the transcripts of a cell can be measured by hybridization of a sample 
comprising transcripts of the cell, or, nucleic acids representative of or complementary to 
transcripts of a cell, by hybridization to immobilized nucleic acids on an array, or 
"biochip." By using an "array" of nucleic acids on a microchip, some or all of the 
transcripts of a cell can be simultaneously quantified. Alternatively, arrays comprising 
genomic nucleic acid can also be used to determine the genotype of a newly engineered 
strain made by the methods of the invention. Polypeptide arrays" can also be used to 
simultaneously quantify a plurality of proteins. The present invention can be practiced 
with any known "array " also referred to as a "microarray" or ^'nucleic acid array" or 
"polypeptide array" or "antibody array" or "biochip," or variation thereof. Arrays are 
generically a plurality of "spots" or "target elements," each target element comprising a 
defined amount of one or more biological molecules, e.g., oligonucleotides, immobilized 
onto a defined area of a substrate surface for specific binding to a sample molecule, e.g., 
mRNA transcripts. 

The terms "array" or "microarray*' or "biochip" or "chip" as used herem is a 
plurality of target elements, each target element comprising a defined amount of one or 
more polypeptides (including antibodies) or nucleic acids immobihzed onto a defined 
area of a substrate surface, as discussed in further detail, below. 
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In practicing the methods of the invention, any known array and/or method of 
making and usmg arrays can be incorporated in whole or in part, or variations thereof, as 
described, for example, in U.S. Patent Nos. 6,277.628; 6,277,489; 6,261,776; 6,258,606; 
6,054,270; 6,048,695; 6,045,996; 6,022,963; 6,013,440; 5,965,452; 5,959,098; 5,856,174; 
5,830,645; 5,770,456; 5,632,957; 5,556,752; 5,143,854; 5,807,522; 5,800,992; 5,744,305; 
5,700,637; 5,556,752; 5,434,049; see also, e.g., WO 99/51773; WO 99/09217; WO 
97/46313; WO 96/17958; see also, e.g., Johnston (1998) Cmr. Biol. 8:R171-R174; 
Schummer (1997) Biotechniques 23:1087-1092; Kern (1997) Biotechniques 23:120-124; 
Solinas-Toldo (1997) Genes, Chromosomes & Cancer 20:399-407; Bowtell (1999) 
Natm-e Genetics Supp. 21:25-32. See also published U.S. patent appUcations Nos. 
20010018642; 20010019827; 20010016322; 20010014449; 20010014448; 20010012537; 
20010008765. 

Antibodies and Antibodv-based screening methods 

The invention provides isolated or recombinant antibodies that specifically bind to 
a cellulase, e.g., endoglucanase, cellobiohydroiase, mannanase and/or beta-glucosidase 
enzyme of the invention. These antibodies can be used to isolate, identify or quantify the 
cellulase, e.g., endoglucanase, cellobiohydroiase, mannanase and/or beta-glucosidase 
enzymes of die invention or related polypeptides. These antibodies can be used to isolate 
other polypeptides within the scope the invention or other related cellulase, e.g., 
endoglucanase, cellobiohydroiase, mannanase and/or beta-glucosidase enzymes. The 
antibodies can be designed to bind to an active site of a cellulase, e.g., endoglucanase, 
cellobiohydroiase, mannanase and/or beta-glucosidase enzyme. Thus, the invention 
provides methods of inhibiting cellulase, e.g., endoglucanase, cellobiohydroiase, 
mannanase and/or beta-glucosidase enzymes using the antibodies of the invention (see 
discussion above regarding applications for anti-cellulase, e.g., anti-endoglucanase, anti- 
cellobiohydrolase and/or anti-beta-glucosidase enzyme compositions of the invention). 

The term "antibody" includes a peptide or polypeptide derived from, modeled 
after or substantially encoded by an unmunoglobulin gene or immunoglobulin genes, or 
fragments thereof, capable of specifically binding an antigen or epitope, see, e.g. 
Fundamental Immunology, Third Edition, W.E. Paul, ed.. Raven Press, N.Y. (1993); 
Wilson (1994) J. Immunol. Methods 175:267-273; Yamiush (1992) J. Biochem. 
Biophys. Methods 25:85-97. The term antibody includes antigen-binding portions, i.e., 
"antigen binding sites," (e.g., fragments, subsequences, complementarity determimng 
regions (CDRs)) that retain capacity to bind antigen, including (i) a Fab fragment, a 



wo 2006/101584 



161 



PCT/US2006/002516 



564462014240/D2150-2WO 

monovalent fragment consisting of the VL, VH, CL and CHI domains; (ii) a F(ab')2 
fragment, a bivalent fragment comprising two Fab fragments linked by a disulfide bridge 
at the liinge region; (iii) a Fd fragment consisting of the VH and CHI domains; (iv) a Fv 
fragment consisting of the VL and VH domains of a single arm of an antibody, (v) a dAb 
5 fragment (Ward et ah, (1989) Nature 341:544-546), which consists of a VH domain; and 
(vi) an isolated complementarity determining region (CDR). Single chain antibodies are 
also included by reference in the term "antibody." 

The invention provides fragments of the enzymes of the invention (e.g., peptides) 
including immunogenic fragments (e.g., subsequences) of a polypeptide of the invention. 

1 0 The invention provides compositions comprising a polypeptide or peptide of the 
invention and adjuvants or carriers and the like. 

The antibodies can be used in immunoprecipitation, staining, immunoaffinity 
columns, and the like. If desired, nucleic acid sequences encoding for specific antigens 
can be generated by immunization followed by isolation of polypeptide or nucleic acid, 

15 amplification or cloning and immobiUzation of polypeptide onto an array of the 

invention. Alternatively, the methods of the invention can be used to modify the structure 
of an antibody produced by a cell to be modified, e.g., an antibody's affinity can be 
increased or decreased. Furthermore, the ability to make or modify antibodies can be a 
phenotype engineered into a cell by the methods of the invention. 

20 Methods of immunization, producing and isolating antibodies (polyclonal and 

monoclonal) are known to those of skill in the art and described in the scientific and 
patent literature, see, e.g., Coligan, CURRENT PROTOCOLS IN IMMUNOLOGY, 
Wiley/Greene, NY (1991); Stites (eds.) BASIC AND CLINICAL IMMUNOLOGY (7th 
ed.) Lange Medical Publications, Los Altos, CA ("Stites"); Coding, MONOCLONAL 

25 ANTIBODIES: PRINCIPLES AND PRACTICE (2d ed.) Academic Press, New York, 
NY (1986); Kohler (1975) Nature 256:495; Harlow (1988) ANTIBODIES, A 
LABORATORY MANUAL, Cold Spring Harbor Publications, New York. Antibodies 
also can be generated in vitro, e.g., using recombinant antibody binding site expressing 
phage display libraries, in addition to the traditional in vivo methods using animals. See, 

30 e.g., Hoogenboom (1997) Trends Biotechnol. 1 5:62-70; Katz (1997) Annu. Rev. Biophys. 
BiomoL Struct. 26:27-45, 

The polypeptides of the invention or fi-agments comprising at least 5, 10, 15, 20, 
25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids thereof, may also be used to 
generate antibodies which bind specifically to the polypeptides or fiagments. The 
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resulting antibodies may be used in immunoaffkdty chromatograpliy procedures to isolate 
or purify the polypeptide or to determine whether the polypeptide is present in a 
biological sample. In such procedures, a protein preparation, such as an extract, or a 
biological sample is contacted with an antibody capable of specifically binding to one of 
5 the polypeptides of the invention, or fragments comprising at least 5, 10, 15, 20, 25, 30, 
35, 40, 50, 75, 100, or 150 consecutive amino acids thereof. 

In immunoaffmity procedures, the antibody is attached to a solid support, such as a 
bead or other column matrix. The protein preparation is placed in contact with the 
antibody under conditions in which the antibody specifically binds to one of the 

1 0 polypeptides of the invention, or fragment thereof. After a wash to remove non- 
specificaUy bound proteins, the specifically bound polypeptides are eluted. 

The ability of protems in a biological sample to bind to the antibody may be 
determined using any of a vaiiety of procedures familiar to those skilled in the art. For 
example, binding may be determined by labeling the antibody with a detectable label such 

15 as a fluorescent agent, an enzymatic label, or a radioisotope. Alternatively, binding of the 
antibody to the sample may be detected using a secondary antibody having such a 
detectable label thereon. Particular assays include ELIS A assays, sandwich assays, 
radioimmunoassays and Western Blots. 

Polyclonal antibodies genqrated against the polypeptides of the invention, or 

20 fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive 
amino acids thereof can be obtained by direct injection of the polypeptides into an animal 
or by administering the polypeptides to an animal, for example, a nonhuman. The 
antibody so obtained can bind the polypeptide itself. In this manner, even a sequence 
encodmg only a fragment of the polypeptide can be used to generate antibodies which 

25 may bind to the whole native polypeptide. Such antibodies can then be used to isolate the 
polypeptide from cells expressing that polypeptide. 

For preparation of monoclonal antibodies, any technique which provides 
antibodies produced by continuous cell line cultures can be used. Examples include the 
hybridoma technique (Koliler and MUstein, Nature, 256^495-497, 1975), the trioma 

30 technique, die human B-oell hybridoma technique (Kozbor et al. Immunology Today 
4:72, 1983) and the EBV-hybridoma technique (Cole, et al, 1985, in Monoclonal 
Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). 

Techniques described for the production of single chain antibodies (U.S. Patent 
No. 4,946,778) can be adapted to produce single chain antibodies to the polypeptides of 
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the invention, or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 
1 50 consecutive amino acids thereof. Alternatively, transgenic mice may be used to 
express humanized antibodies to these polypeptides or fragments tliereof. 

Antibodies generated against the polypeptides of the invention, or fragments 
comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino 
acids thereof may be used in screening for similar polypeptides from other organisms and 
samples. In such techniques, polypeptides from the organism are contacted with the 
antibody and those polypeptides which specifically bind the antibody are detected. Any 
of the procedures described above may be used to detect antibody binding. One such 
screening assay is described in "Methods for Measuring CeUulase Activities", Methods in 
Enzymology, Vol 160, pp. 87-116. 

Kits 

The invention provides kits comprising the compositions, e.g., nucleic acids, 
expression cassettes, vectors, cells, transgenic seeds or plants or plant parts, polypeptides 
(e.g., a cellulase enzyme) and/or antibodies of the invention. The kits also can contain 
instructional material teachmg the methodologies and industrial, medical and dietary uses 
of the invention, as described herein. 

Whole cell engineering and measuring metabohc parameters 

The methods of the invention provide whole cell evolution, or whole cell 
engineering, of a cell to develop a new cell strain having a new phenotype, e.g., a new or 
modified cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase enzyme activity, by modifying the genetic composition of the cell. See U.S. 
patent application no. 20040033975. 

The genetic composition can be modified by addition to the cell of a nucleic acid 
of the invention, e.g., a coding sequence for an enzyme of the invention. See, e.g., 
WO0229032; WO0196551. 

To detect the new phenotype, at least one metabolic parameter of a modified cell 
is monitored in the cell in a "real time" or "on-line" time frame. In one aspect, a plurality 
of cells, such as a cell culture, is monitored in "real time*' or "on-line." In one aspect, a 
pluraUty of metabolic parameters is monitored in "real time" or "on-line." Metabolic 
parameters can be monitored using the cellulase, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase enzymes of the invention. 
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Metabolic flux analysis (MFA) is based on a known biochemistry jframework. A 
linearly independent metabolic matrix is constructed based on the law of mass 
conservation and on the pseudo-steady state hypothesis (PSSH) on the intracellular 
metabolites. In practicing the methods of the invention, metabolic networks are 
established, including the: 

• identity of all pathway substrates, products and intennediary metabolites 

• identity of all the chemical reactions interconverting the pathway metabolites, 
the stoichiometry of the pathway reactions, 

• identity of all the enzymes catalyzing the reactions, the enzyme reaction kinetics, 

• the regulatory interactions between pathway components, e,g, allosteric 
interactions, enzyme-enzyme interactions etc, 

• intracellular compartmentalization of enzymes or any other supramolecular 
organization of the enzymes, and, 

• the presence of any concentration gradients of metabolites, enzymes or effector 
molecules or diffusion barriers to then* movement 

Once the metabolic network for a given strain is built, mathematic presentation by 
matrix notion can be introduced to estimate the inb:acellular metabolic fluxes if the on- 
line metabolome data is available. Metabolic phenotype relies on the changes of the 
whole metabolic network within a cell. Metabolic phenotype relies on the change of 
pathway utilization with respect to environmental conditions, genetic regulation, 
developmental state and the genotype, etc. In one aspect of the methods of the invention, 
after the on-line MFA calculation, the dynamic behavior of the cells, thek phenotype and 
other properties are analyzed by investigating the pathway utilization. For example, if the 
glucose supply is increased arid the oxygen decreased during the yeast fermentation, the 
utilization of respiratory pathways will be reduced and/or stopped, and the utilization of 
the fermentative pathways will dominate. Control of physiological state of cell cultures 
will become possible after the pathway analysis. The methods of the invention can help 
determine how to manipulate the fermentation by determining how to change the . 
substmte supply, temperature, use of inducers, etc. to control the physiological state of 
cells to move along desirable direction. In pi-acticing the methods of the invention, the 
MFA results can also be compared with transcriptome and proteome data to design 
experiments and protocols for metabolic engineering or gene shuffling, etc. 
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In practicing the methods of the invention, any modified or new phenotype can be 
conferred and detected, including new or improved characteristics in the cell Any aspect 
of metabolism or growth can be monitored. 

Monitoring expression of an mRNA transcript 

In one aspect of the invention, the engineered phenotype comprises increasing or 
decreasing the expression of an mRNA transcript (e.g., a cellulase, e.g., endoglucanase, 
ceilobiohydrolase, mannanase and/or beta-glucosidase enzyme message) or generating 
new (e.g., cellulase, e.g., endoglucanase, ceilobiohydrolase, mannanase and/or beta- 
glucosidase en2yme) transcripts in a cell. This increased or decreased expression can be 
traced by testing for the presence of a cellulase, e.g., endoglucanase, ceilobiohydrolase, 
mannanase and/or beta-glucosidase enzyme of the invention or by cellulase, e.g., 
endoglucanase, ceilobiohydrolase, mannanase and/or beta-glucosidase enzyme activity 
assays. mRNA transcripts, or messages, also can be detected and quantified by any 
method known in the art, including, e.g., Northern blots, quantitative amplification 
reactions, hybridization to arrays, and the like. Quantitative amplification reactions 
include, e.g., quantitative PGR, including, e.g., quantitative reverse transcription 
polymerase chain reaction, or RT-PCR; quantitative real time RT-PCR, or "real-time 
Idnetic RT-PCR" (see, e.g., Kreuzer (2001) Br. J. Haematol. 1 14:313-318; Xia (2001) 
Transplantation 72:907-914). 

In one aspect of the invention, the engineered phenotype is generated by 
knocking out expression of a homologous gene. The gene's coding sequence or one or 
more transcriptional control elements can be knocked out, e.g., promoters or enhancers. 
Thus, the expression of a transcript can be completely ablated or only decreased. 

In one aspect of the invention, the engineered phenotype comprises increasing 
the expression of a homologous gene. This can be effected by knocking out of a negative 
control element, including a transcriptional regulatory element acting in cis- or trans- , or, 
mutagenizing a positive contix)! element. One or more, or, all the transcripts of a cell can 
be measured by hybridization of a sample comprising transcripts of the cell, or, nucleic 
acids representative of or complementary to transcripts of a cell, by hybridization to 
immobilized nucleic acids on an array. 

Monitoring expression of a polypeptides, peptides and amino acids 

In one aspect of the invention, the engineered phenotype comprises increasing or 
decreasing the expression of a polypeptide (e.g., a cellulase, e.g., endoglucanase. 
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cellobiohydrolase, marmauase and/or beta-glucosidase enzyme) or generating new 
polypeptides in a cell. This increased or decreased expression can be traced by 
determining the amount of cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase en2yme present or by cellulase, e.g., endoglucanase, 
5 cellobiohydrolase, maimauase and/or beta-glucosidase enzyme activity assays. 

Polypeptides, peptides and amino acids also can be detected and quantified by any 
method known in the art, including, e.g., nuclear magnetic resonance (NMR), 
spectrophotometry, radiography (protein radiolabeling), electrophoresis, capillary 
electrophoresis, high performance liquid chromatography (HPLC), thin layer 

10 chromatography (TLC), hyperdiffiision chromatography, various immunological 
methods, e.g. incimunoprecipitation, inomunodiflaision, inununo-electrophor^sis, 
radioimmunoassays (RJAs), enzyme-linked immimosorbent assays (ELIS As), immune- 
fluorescent assays, gel electrophoresis (e.g., SDS-PAGE), staining with antibodies, 
fluorescent activated cell sorter (FACS), pyrolysis mass spectrometry, Fourier-Transform 

15 Infrared Spectrometry, Raman spectrometry, GC-MS, and LC-Electrospray and cap-LC- 
tandem-electrospray mass spectrometries, and the like. Novel bioactivities can also be 
screened using methods, or variations thereof, described in U.S. Patent No. 6,057,103. 
Fijrthermore, as discussed below in detail, one or more, or, all the polypeptides of a cell 
can be measured using a protein array. 

20 Industrial. Energv, Pharmaceutical and other Applications 

Polypeptides of the invention (e.g., having cellulase, e.g., endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase) can catalyze the breakdown of 
cellulose. The enzymes of the invention can be highly selective catalysts. The invention 
provides industrial pi-ocesses using enzymes of the uivention, e.g., in the pharmaceutical 

25 or nutrient (diet) supplement industry, the energy industry (e.g., to make "clean" 

biofuels), in the food and feed industries, e.g., m methods for making food and feed 
products and food and feed additives. In one aspect, the invention provides processes 
using enzymes of the invention in flie medical industry, e.g., to make pharmaceuticals or 
dietary aids or supplements, or food supplements and additives. In addition, the invention 

30 provides methods for using the enzymes of the invention in bioethanol, including "clean" 
fuel, production. 

The enzymes of the invention can catalyze reactions with exquisite stereo-, regio- 
and chemo- selectivities. The cellulase, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase enzymes of the invention can be engineered to 
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function in various solvents, operate at extreme pHs (for example, high pHs and low pHs) 
extreme temperatures (for example, high temperatures and low temperatures), extreme 
salinity levels (for example, high salinity and low salinity) and catalyze reactions with 
compounds that are structurally unrelated to their natural, physiological substrates. 

5 Biomass conversion and production of clean bio fuels 

The invention provides enzymes and methods for the conversion of biomass (e.g., 
lignocellxilosic materials) to fuels (e.g., bioethanol) and chemicals. Thus, the 
compositions and methods of the invention provide effective and sustainable alternatives 
to use of petroleum-based products. The invention provides organisms expressing 

1 0 enzymes of the invention for participation in chemical cycles involving natural biomass 
conversion. In one aspect, enzymes and methods for the conversion are used in enzyme 
ensembles for the efficient depolymerization of cellulosic and hemicellulosic polymers to 
metabolizable carbon moieties. As discussed above, the invention provides methods for 
discovering and implementing the most effective of enzymes to enable these important 

15 new "biomass conversion" and alternative energy industrial processes. 

In one aspect, the polypeptides of the invention, e.g., protems having cellulase 
activity, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
activity, are used in processes for converting lignocellulosic biomass to ethanol. The 
invention also provides processes for making ethanol ('TDioethanol") from compositions 

20 comprising Hgnocellulosic biomass. The lignocellulose biomass material can be obtained 
j&om agricultural crops, as a byproduct of food or feed production, or as lignocellulosic 
waste products, such as plant residues and waste paper. Examples of suitable plant 
residues for treatment with polypeptides of the invention include stems, leaves, hulls, 
husks, cobs and the like, as weU as wood, wood chips, wood pulp, and sawdust. Examples 

25 of paper waste suitable for treatment with polypeptides of die invention include discard 
photocopy paper, computer printer paper, notebook paper, notepad paper, typewriter 
paper, and the like, as well as newspapers, magazines, cardboard, and paper-based 
packaging materials. 

In one aspect, the enzymes and methods of the invention can be used in 

30 conjunction with more ''traditional" means of making ethanol from biomass, e.g., as 
methods comprishig hydrolyzing lignocellulosic materials by subjecting dried 
hgnocellulosic material in a reactor to a catalyst comprised of a dilute solution of a strong 
acid and a metal salt; this can lower the activation energy, or the temperature, of cellulose 
hydrolysis to obtain higher sugar yields; see, e.g., U.S. Patent Nos. 6,660,506; 6,423,145. 
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Another exemplary method that incorporated use of enzymes of the invention 
comprises hydrolyzing lignocellulosic material containing hemicellulose, cellulose and 
hgnin by subjecting the material to a first stage hydrolysis step in an aqueous medium at a 
temperature and a pressure chosen to effect primarily depolymerization of hemicellulose 
5 without major depolymerization of cellulose to glucose. This step results in a slurry in 
which the liquid aqueous phase contains dissolved monosaccharides resulting from 
depolymerization of hemicellulose and a solid phase containing cellulose and lignin. A 
second stage hydrolysis step can comprise conditions such that at least a major portion of 
the cellulose is depolymerized, such step resulting in a liquid aqueous phase containing 
10 dissolved/ soluble depolymerization products of cellulose. See, e.g., U.S. Patent No. 
5,536,325. Enzymes of the invention can be added at any stage of this exemplary 
process. 

Another exemplary method that incorporated ixse of enzymes of the invention 
comprises processing a lignocellulose-containing biomass material by one or more stages 

15 of dUute acid hydrolysis with about 0.4% to 2% strong acid; and treating an unreacted 
solid lignocellulosic component of the acid hydrolyzed biomass material by alkaline 
delignificatiou to produce precursors for biodegradable thermoplastics and derivatives. 
See, e.g., U.S. Patent No. 6,409,841. Enzymes of the invention can be added at any stage 
of this exemplary process. 

20 Another exemplary metliod that incorporated use of enzymes of the invention 

comprises prehydrolyzing lignocellulosic material in a prehydrolysis reactor, adding an 
acidic liquid to the solid lignocellulosic material to make a mixture; heating the mixture 
to reaction temperature; maintaining reaction temperature for time sufficient to fractionate 
the lignocellulosic material into a solubilized portion containing at least about 20% of the 

25 lignin from the lignocellulosic material and a solid fraction containing cellulose; 
removing a solubilized portion from the solid fraction while at or near reaction 
temperature wherein the cellulose in the solid fraction is rendered more amenable to 
enzymatic digestion; and recovering a solubilized portion. See, e.g., U.S. Patent No. 
5,705,369. Enzymes of the invention can be added at any stage of this exemplary 

30 process. 

The invention provides methods for making motor ftiel compositions (e.g., for 
spark ignition motors) based on liquid hydrocarbons blended with a fuel grade alcohol 
made by using an enzyme or a method of the invention. In one aspect, the ftiels made by 
use of an enzyme of the invention comprise, e.g., coal gas liquid- or natural gas Uquid- 
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ethanol blends. In one aspect, a co-solvent is biomass-derived 2-methyItetrahydrofuran 
(MTHF). See, e.g, U.S. Patent No. 6,712,866. 

Methods of the invention for the enzymatic degradation of lignocellulose, e.g., for 
production of ethanol from hgnocellulosic material, can also comprise use of ultrasonic 
5 treatment of the biomass material; see, e.g., U.S. Patent No. 6,333,181. 

Another exemplary process for making a biofuel comprising ethanol using 
enzymes of the invention comprises pretreating a starting material comprising a 
lignocellulosic feedstock comprising at least hemiceUulose and cellulose. In one aspect, 
the starting material comprises potatoes, soybean (rapeseed), barley, rye, com, oats, 

1 0 wheat, beets or sugar cane or a component or waste or food or feed production byproduct. 
The starting material ("feedstock") is reacted at conditions which disrupt the plant's fiber 
structure to effect at least a partial hydrolysis of the hemiceUulose and cellulose. 
Disruptive conditions can comprise, e.g., subjecting the starting material to an average 
temperature of 180°C to ll^fC at pH 0.5 to 2.5 for a period of about 5 seconds to 60 

15 minutes; or, temperature of 220^C to 270°C, at pH 0.5 to 2.5 for a period of 5 seconds to 
120 seconds, or equivalent. This generates a feedstock with increased accessibility to 
being digested by an enzyme, e.g., a cellulase enzyme of the invention. U.S. Patent No. 
6,090,595. 

Exemplary conditions for cellulase hydrolysis of lignocellulosic material include 
20 reactions at temperatures between about 30°C and 48*^0, and/or a pH between about 4.0 

and 6.0. Other exemplary conditions include a temperature between about 30°C and 60°C 
and a pH between about 4.0 and 8.0. 

Animal feeds and food or feed additives 

In addition to providing dietary aids or supplements, or food supplements and 
25 additives for hxunan use, the invention also provides compositions and methods for 
treating animal feeds and foods and food or feed additives using a polypeptide of the 
invention, e.g., a protein having cellulase activity, e.g., endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase enzymes of the invention, and/or the antibodies of the 
invention. The invention provides animal feeds, foods, and additives comprising 
30 cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 

enzymes of the invention and/or antibodies of the invention. The animal can be any farm 
animal or any animal. 

The animal feed additive of the invention may be a granulated enzyme product 
that may readily be mixed with feed components. Alternatively, feed additives of the 
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invention can form a component of a pre-mix. The granulated enzyme product of the 
invention may be coated or uncoated. The particle size of the enzyme granulates can be 
compatible with that of feed and pre-mix components. This provides a safe and 
convenient mean of incorporating enzymes into feeds. Alternatively, the animal feed 

6 additive of the invention may be a stabilized liquid composition. This may be an aqueous 
or oil-based slurry. See, e.g., U.S. Patent No. 6,245,546. 

Cellulase, e.g., endoglucanase, ceUobiohydrolase, maimanase and/or beta- 
glucosidase enzymes of the present invention, in the modification of animal feed or a 
food, can process the food or feed either in vitro (by modifying components of the feed or 

10 food) or in vivo. Polypeptides of the invention can be added to animal feed or food 
compositions. 

In one aspect, an enzyme of the invention is added in combination with another 
enzyme, e.g., beta-galactosidases, catalases, laccases, other cellulases, endoglycosidases, 
endo-beta-l,4-laccases, amyloglucosidases, other glucosidases, glucose isomerases, 

15 glycosyltransferases, hpases, phospholipases, lipooxygenases, beta-laccases, endo-beta- 
l,3(4)-laccas6S, cutinases, peroxidases, amylases, glucoamylases, pectinases, reductases, 
oxidases, decarboxylases, phenoloxidases, ligninases, puilulanases, arabinanases, 
hemicellulases, mannanases, xylolaccases, xylauases, pectin acetyl esterases, 
rhamnogalacturonan acetyl esterases, proteases, peptidases, proteinases, 

20 polygalacturonases, rhanmogalacturonases, galactanases, pectin lyases, 

transglutaminases, pectin methylesterases, other cellobiohydrolases and/or 
transglutaminases. These enzyme digestion products are more digestible by the animal. 
Thus, cellulase, e.g., endogliicanase, ceUobiohydrolase, mannanase and/or beta- 
glucosidase enzymes of the invention can contribute to the available energy of the feed or 

25 food, or to the digestibility of the food or feed by breaking down cellulose. 

In another aspect, cellulase, e.g., endoglucanase, ceUobiohydrolase, mannanase 
and/or beta-glucosidase enzyme of the invention can be supplied by expressiug the 
enzymes directly in transgenic feed crops (as, e.g., transgenic plants, seeds and the like), 
such as grains, cereals, com, soybean, rape seed, lupin and the like. As discussed above, 

30 the invention provides transgenic plants, plant parts and plant ceUs comprising a nucleic 
acid sequence encoding a polypeptide of the invention. In one aspect, the nucleic acid is 
expressed such that the ceUidase, e.g., endoglucanase, ceUobiohydrolase, mannanase 
and/or beta-glucosidase etizyme of the invention is produced in recoverable quantities. 
The ceUulase, e.g., endoglucanase, ceUobiohydrolase, mannanase and/or beta-glucosidase 
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enzyme can be recovered from any plant or plant part. Alternatively, the plant or plant 
part containing the recombinant polypeptide can be used as such for improving the 
quality of a food or feed, e.g., improving nutritional value, palatability, etc. 

In one aspect, the enzyme delivery matrix of the invention is in the form of 
5 discrete plural particles, pellets or granules. By "granules" is meant particles that are 
compressed or compacted, such as by a pelletizing, extrusion, or similar compacting to 
remove water from the matrix. Such compression or compacting of the particles also 
promotes intraparticle cohesion of the particles. For example, the granules can be 
prepared by pelletizing the grain-based substrate in a pellet mill. The pellets prepared 

10 thereby are ground or crumbled to a granule size suitable for use as an adjuvant in animal 
feed. Since the matrix is itself approved for use in animal feed, it can be used as a diluent 
for delivery of enzymes in animal feed. 

In one aspect, the cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme coutauied in the invention enzyme delivery matrix and 

15 methods is a thermostable cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase enzyme, as described herein, so as to resist inactivation of the 
cellulase, e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase 
enzyme during manufacture where elevated temperatures and/or steam may be employed 
to prepare the palletized enzyme delivery matrix. During digestion of feed containing the 

20 invention enzyme delivery matrix, aqueous digestive fluids will cause release of the 

active enzyme. Other types of thermostable enzymes and nutritional supplements that are 
thermostable can also be incorporated in the delivery matrix for release under any type of 
aqueous conditions. 

In one aspect, a coating is applied to the enzyme matrix particles for many 

25 different purposes, such as to add a flavor or nutrition supplement to animal feed, to delay 
release of animal feed supplements and enzymes in gastric conditions, and the lilce. In 
one aspect, the coating is applied to achieve a functional goal, for example, whenever it is 
desirable to slow release of the enzyme from the matrix particles or to control the 
conditions under which the enzyme will be released. The composition of the coating 

30 material can be such that it is selectively broken down by an agent to which it is 

susceptible (such as heat, acid or base, enzymes or other chemicals). Alternatively, two or 
more coatings susceptible to different such brealcdown agents may be consecutively 
applied to the matrix particles. 
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The invention is also directed towards a process for preparing an enzyme- 
releasing matrix. In accordance with the invention, the process comprises providing 
discrete plural particles of a grain-based substrate in a particle size suitable for use as an 
enzyme-releasing matrix, wherein the particles comprise a cellulase, e.g., endoglucanase, 

5 cellobiohydrolase, mannanase and/or beta-glucosidase enzyme encoded by an amino acid 
sequence of the hivention. In one aspect, the process includes compacting or 
compressing the particles of enzyme-releasing matrix into granules, which most in one 
aspect is accomplished by pelletizing. The mold inhibitor and cohesiveness agent, when 
used, can be added at any suitable time, and in one aspect are mixed with the grain-based 

10 substrate in the desired proportions prior to pelletizing of the grain-based substrate. 

Moisture content in the pellet mill feed in one aspect is in the ranges set forth above with 
respect to the moisture content in the finished product, and in one aspect is about 14-15%. 
In one aspect, moisture is added to the feedstock in the form of an aqueous preparation of 
the enzyme to bring the feedstock to this moisture content. The tempemture in the pellet 

16 mill in one aspect is brought to about 82°C with steam. The pellet mill may be operated 
under any conditions that impart sufficient work to the feedstock to provide pellets. The 
pelleting process itself is a cost-effective process for removing water from the enzyme- 
containing composition. 

The compositions and methods of the invention can be practiced in conjunction 

20 with administration of prebiotics, which are high molecular weight sugars, e.g., fructo- 
oUgosaccharides (FOS); galacto-oligosaccharides (GOS), GRAS (Generally Recognized 
As Safe) material. These prebiotics can be metabolized by some probiotic lactic acid 
bacteria (LAB). They are non-digestible by the majority of intestinal microbes. 

Treating foods and food processing 

25 The invention provides foods and feeds comprising enzymes of the invention, and 

methods for using enzymes of the invention in processing foods and feeds. Cellulases, 
e.g., endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzymes of 
the invention have numerous applications in food processing industry. The invention 
provides methods for hydrolyzing cellulose-comprising compositions, including, e.g., a 

30 plant cell, a bacterial cell, a yeast cell, an insect cell, or an animal cell, or any plant or 
plant part, or any food or feed, a waste product and the like. 

For example, the invention provides feeds or foods comprising a cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzyme the 
invention, e.g., in a feed, a liquid, e.g., a beverage (such as a fruit juice or a beer), a bread 



wo 2006/101584 



173 



PCT/US2006/002516 



5644620 14240/D21 50-2WO 

or a dough or a bread product, or a drink (e.g., a beer) or a beverage precursor (e.g., a 
wort). 

The food treatment processes of the invention can also include the use of any 
combination of other enzymes such as tryptophanases or tyrosine decarboxylases, 

5 laccases, catalases, laccases, other cellulases, endoglycosidases, endo-beta-l,4-laccases, 
amyloglucosidases, other glucosidases, glucose isomerases, glycosyltransferases, lipases, 
phosphoiipases, lipooxygenases, beta-laccases, endo-beta-l,3(4)-laccases, cutinases, 
peroxidases, amylases, glucoamylases, pectinases, reductases, oxidases, decarboxylases, 
phenoloxidases, ligninases, puUulanases, arabinanases, hemicellulases, mannanases, 

10 xylolaccases, xylanases, pectin acetyl esterases, rhanmogalacturonan acetyl esterases, 
proteases, peptidases, proteinases, polygalacturonases, rhamnogalacturonases, 
galactanases, pectin lyases, transglutandnases, pectin methylesterases, other 
ccUobiohydrolases and/or transglutaminases. 

In one aspect, the invention provides enzymes and processes for hydrolyzing 

15 liquid (liquefied) and granular starch. Such starch can be derived from any source, e.g., 
beet, cane sugar, potato, com, wheat, milo, sorghum, rye or bulgher. The iavention 
applies to any plant starch source, e.g., a grain starch source, which is usefol in 
liquefaction (for example, to make bioethanol), including any other grain or vegetable 
source known to produce starch suitable for liquefaction. The methods of the invention 

20 comprise liquefying starch (e.g., making bioethanol) from any natural material, such as 
rice, germinated rice, com, barley, milo, wheat, legumes, potato, beet, cane sugar and 
sweet potato. The liquefying process can substantially hydrolyze the starch to produce a 
symp. The temperature range of the liquefaction can be any liquefaction temperature 
which is known to be effective in liquefying starch. For example, the temperature of the 

25 starch can be between about 80°C to about IIS'^C, between about 100°C to about llO'^C, 
and from about lOS^'C to about 108^C. The bioethanols made using the enzymes and 
processes of the invention can be used as fuels or in ftiels (e.g., auto fuels), e.g., as 
discussed below, in addition to their use in (or for malcing) foods and feeds, including 
alcoholic beverages, 

30 Waste ti-eattnent 

The invention provides enzymes for use in waste treatment. Cellulases, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzymes of the 
invention can be used Ln a variety of waste treatment or related industrial applications, 
e.g., in waste treatment related to biomass conversion to generate fuels. For example, in 
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one aspect, the invention provides a solid and/or liquid waste digestion process using 
cellulase, e.g., endoglucanase, cellobiohydrolase, marmanase and/or beta-glucosidase 
enzymes of the invention. The methods can comprise reducing the mass and volimae of 
substantially untreated solid waste. Solid waste can be treated with an enzymatic 
5 digestive process in the presence of an enzymatic solution (including cellulase, e.g., 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase enzymes of the 
invention) at a controlled temperature. This results in a reaction without appreciable 
bacterial fermentation from added microorganisms. The solid waste is converted into a 
liquefied waste and any residual solid waste. The resulting liquefied waste can be 
10 separated firom said any residual solidified waste. See e.g., U,S. Patent No. 5,709,796. 

In one aspect, the compositions and methods of the invention are used for odor 
removal, odor prevention or odor reduction, e.g., in animal waste lagoons, e.g., on swine 
farms, in other animal waste management systems, or in any industrial or food processing 
application. 

1 5 The enzymes and methods for the conversion of biomass (e.g., lignocellulosic 

materials) to fuels (e.g., bioefhanol) can incorporate the treatment/ recycling of mxmicipal 
solid waste material, including waste obtamed directly from a municipality or municipal 
solid waste that was previously land-filled and subsequently recovered, or sewage sludge, 
e.g., in the form of sewage sludge cake which contains substantial amounts of cellulosic 

20 material. Since sewage sludge cakes will normally not contaui substantial amounts of 
recyclable materials (aluminum, glass, plastics, etc.), they can be directly treated with 
concentmted sulfuric acid (to reduce the heavy metal content of the cellulosic component 
of the waste) and processed in the ethanol production system. See, e.g., U.S. Patent Nos. 
6,267,309; 5,975,439. 

25 Another exemplary method using enzymes of the invention for recovering organic 

and inorganic matter from waste material comprises sterilizing a solid organic matter and 
softenuig it by subjecting it to heat and pressure. This exemplary process may be carried 
out by first agitating waste material and then subjecting it to heat and pressure, which 
steriUzes it and softens the organic matter contained therein. In one aspect, after heating 

30 imder pressure, the pressure may be suddenly released from a perforated chamber to 

forces the softened organic matter outwardly through perforations of the container, thus 
separating the organic matter from the soUd morganic matter. The softened sterilized, 
organic matter is then fermented in fermentation chamber, e.g., using enzymes of the 
invention, e.g., to form a mash. The mash may be subjected to further processing by 
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centrifuge, distillation colmnn and/or anaerobic digester to recover fuels such as ethanol 
and methane, and animal feed supplements. See, e.g., U.S. Patent No. 6,251,643. 

Enzymes of the invention can also be used in processes, e.g., pretreatments, to 
reduce the odor of an industrial waste, or a waste generated from an animal production 

5 facility, and the like. For example, enzymes of the invention can be used to treat an 

animal waste in a waste holding faciUty to enhance efficient degradation of large amounts 
of organic matter with reduced odor. The process can also include inoculation with 
sulfide-utilizing bacteria and organic digesting bacteria and lytic enzymes (in addition to 
an enzyme of the invention). See, e.g., U.S. Patent No, 5,958,758. 

10 Enzymes of the invention can also be used in mobile systems, e.g., batch type 

reactors, for bioremediation of aqueous, hazardous wastes, e.g., as described in U.S. 
Patent No. 5,833,857. Batch type reactors can be large vessels having circulatory 
capabiUty wherein bacteria (e.g., expressing an enzyme of the invention) are maintained 
in an efficient state by nutrients being feed into the reactor. Such systems can be used . 

15 where effluent can be delivered to the reactor or the reactor is built into a waste water 

treatment system. Enzymes of the invention can also be used in treatment systems for use 
at small or temporary remote locations, e.g., portable, high volume, highly efficient, 
versatile waste water treatment systems. 

The waste treatment processes of the invention can include the uso of any 

20 combination of other enzymes such as other cellulase, e.g., endoglucanase, 

cellobiohydrolase, mannanase and/or beta-glucosidase enzymes, catalases, laccases, other 
ceUuIases, endoglycosidases, endo-beta-l,4-laccases, amyloglucosidases, other 
glucosidases, glucose isomerases, glycosyltransferases, lipases, phosphoUpases, 
lipooxygenases, beta-laccases, endo-beta-l,3(4)-laccases, cutinases, peroxidases, 

25 amylases, glucoamylases, pectmases, reductases, oxidases, decarboxylases, 

phenoloxidases, ligninases, pullulanases, phytases, arabinanases, hemicellulases, 
mannanases, xylolaccases, xylanases, pectin acetyl esterases, rhamnogalacturonan acetyl 
esterases, proteases, peptidases, proteinases, polygalacturonases, rhanmogalacturonases, 
galactanases, pectin lyases, transglutaminases, pectin methylesterases, other 

30 cellobiohydrolases and/or transglutaminases. 

Detergent Compositions 

The invention provides detergent compositions comprising one or more 
polypeptides of the invention (e.g., enzymes having cellulase, endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase activity) and methods of malting 
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and using these compositions. The invention incorporates all methods of making and 
using detergent compositions, see, e.g., U.S. Patent No. 6,413,928; 6,399,561; 6,365,561; 
6,380,147. The detergent compositions can be a one and two part aqueous composition, a 
non-aqueous liquid composition, a cast solid, a granular form, a particulate form, a 
5 compressed tablet, a gel and/or a paste and a slurry form. The invention also provides 
methods capable of a rapid removal of gross food soils, films of food residue and other 
minor food compositions using these detergent compositions. Enzymes of the invention 
can facilitate tlie removal of starchy stains by means of catalytic hydrolysis of the starch 
polysaccharide. Enzymes of the invention can be used in dishwashing detergents in 

1 0 textile laundering detergents. 

The actual active enzyme content depends upon the method of manufacture of a 
detergent composition and is not critical, assuming the detergent solution has the desired 
enzymatic activity. In one aspect, the amount of glucosidase present in the final solution 
ranges from about 0.001 mg to 0.5 mg per gram of the detergent composition. The 

15 particular enzyme chosen for use in the process and products of this invention depends 
upon the conditions of final utility, including the physical product form, use pH, use 
temperature, and soil types to be degraded or altered. The enzyme can be chosen to 
provide optimum activity and stability for any given set of utility conditions. In one 
aspect, the polypeptides of the present invention are active in the pH ranges of fi-om about 

20 4 to about 12 and in the temperature range of from about 20°C to about 95°C. The 

detergents of the invention can comprise cationic, semi-polar nonionic or zwitterionic 
surfactants; or, mixtures thereof. 

Enzymes of the present invention (e.g., enzymes having cellulase, endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase activity) can be formulated into 

25 powdered and liquid detergents having pH between 4.0 and 12.0 at levels of about 0.01 to 
about 5% (preferably 0.1% to 0.5%) by weight These detergent compositions can also 
include other enzymes such as known proteases, cellulases, lipases or endoglycosidases, 
as well as builders and stabilizers. The addition of enzymes of the invention to 
conventional cleaning compositions does not create any special use limitation. In other 

30 words, any temperature and pH suitable for the detergent is also suitable for the present 
compositions as long as the pH is within the above range, and the temperature is below 
the described enzyme's denaturing temperature. In addition, the polypeptides of the 
invention can be used in a cleaning composition without detergents, again eitlier alone or 
in combination with builders and stabiHzers. 
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The present invention provides cleaning compositions including detergent 
compositions for cleaning hard surfaces, detergent compositions for cleaning fabrics, 
dishwashing compositions, oral cleaning compositions, dentm-e .cleamng compositions, 
and contact lens cleaning solutions. 
6 In one aspect, the invention provides a method for washmg an object comprising 

contacting the object with a polypeptide of the invention under conditions sufficient for 
washing. A polypeptide of the invention may be included as a detergent additive. The 
detergent composition of the invention may, for example, be formulated as a hand or 
machine laundry detergent composition comprising a polypeptide of the invention. A 

10 laundry additive suitable for pre-treatment of stained fabrics can comprise a polypeptide 
of the invention. A fabric softener composition can comprise a polypeptide of the 
invention. Alternatively, a polypeptide of the invention can be formulated as a detergent 
composition for use in general household hard surface cleaning operations. In alternative 
aspects, detergent additives and detergent compositions of the invention may comprise 

15 one or more other enzymes such as a protease, a lipase, a cutinase, another glucosidase, a 
carbohydrase, another cellulase, a pectinase, a mannanase, an arabinase, a galactanase, a 
xylanase, an oxidase, e.g., a lactase, and/or a peroxidase. The properties of the 
enzyme(s) of the invention are chosen to be compatible with the selected detergent (i.e. 
pH-optimxmi, compatibility with other enzymatic and non-enzymatic ingredients, etc.) 

20 and the enzyme(s) is present in effective amounts. In one aspect, enzymes of the 

invention are used to remove malodorous materials from fabrics. Various detergent 
compositions and methods for making them that can be used in practicing the invention 
are described m, e.g., U.S. Patent Nos. 6,333,301; 6,329,333; 6,326,341; 6,297,038; 
6,309,871; 6,204,232; 6,197,070; 5,856,164. 

25 The detergents and related processes of the invention can also include the use of 

any combination of other enzymes such as tryptophanases or tyrosine decarboxylases, 
laccases, catalases, laccases, other cellulases, endoglycosidases, endo-beta-l,4-laccases, 
amyloglucosidases, other glucosidases, glucose isomerases, glycosyltransferases. Upases, 
phosphoUpases, lipooxygenases, beta-laccases, endo-beta-l,3(4)-laccases, cutinases, 

30 peroxidases, amylases, glucoamylases, pectinases, reductases, oxidases, decarboxylases, 
phenoloxidases, ligninases, puUulanases, arabinanases, hemiceilulases, mannanases, 
xylolaccases, xylanases, pectm acetyl esterases, rhamnogalacturonan acetyl esterases, 
proteases, peptidases, proteinases, polygalacturonases, rhamnogalacturonases. 
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galactanases, pectin lyases, transglutaminases, pectin methylesterases, other 
cellobiohydrolases and/or transglutaminases. 

Treating fabrics and textiles 

The invention provides methods of treating fabrics and textiles using one or more 
5 polypeptides of the invention, e.g., enzymes having cellulase, endoglucanase, 

cellobiohydrolase, mannanase and/or beta-glucosidase activity. The polypeptides of the 
invention can be used in any fabric-treating method, which are well Icnown in the art, see, 
e.g., U.S. Patent No. 6,077,316. For example, in one aspect, the feel and appearance of a 
fabric is improved by a method comprising contacting the fabric with an enzyme of the 
10 mvention in a solution. In one aspect, the fabric is treated with the solution under 
pressure. 

In one aspect, the enzymes of the invention are applied during or after the 
weavmg of textiles, or during the desizing stage, or one or more additional fabric 
processing steps. During the weavmg of textiles, the threads are exposed to considerable 

15 mechanical strain. Prior to weaving on mechanical looms, warp yams are often coated 
with sizing starch or starch derivatives in order to increase then tensile strength and to 
prevent breaking. The enzymes of tlie invention can be applied to remove these sizing 
starch or starch derivatives. After the textiles have been woven, a fabric can proceed" to a 
desizing stage. This can be followed by one or more additional fabric processiug steps. 

20 Desizing is the act of removing size from textiles. After weaving, the size coating must 
be removed before further processing the fabric in order to ensxire a homogeneous and 
wash-proof result. The invention provides a method of desizing comprising enzymatic 
hydrolysis of the size by the action of an enzyme of the invention. 

The enzymes of the invention (e.g., enzymes having cellulase, endoglucanase, 

25 cellobiohydrolase, mannanase and/or beta-glucosidase activity) can be used to desize 
fabrics, including cotton-containing fabrics, as detergent additives, e.g., in aqueous 
compositions. The invention provides methods for producing a stonewashed look on 
indigo-dyed denim fabric and garments. For tlie manufacture of clothes, the fabric can be 
cut and sewn into clothes or garments, which is afterwards fmished. In particular, for the 

30 manufacture of denim jeans, different enzymatic finishing methods have been developed. 
The finishiug of denim garment normally is initiated with an enzymatic desizing step, 
during which garments are subjected to the action of amylolytic enzymes in order to 
provide softness to the fabric and make the cotton more accessible to the subsequent 
enzymatic jSnishing steps. The invention provides methods of finishing denim garments 
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(e.g., a "bio-stoning process"), enzymatic desizing and providing softness to fabrics using 
the Enzymes of the invention. The invention provides methods for quickly softening 
denim garments in a desizing and/or finishing process. 

The invention also provides disinfectants comprising enzymes of the invention 

5 (e.g., enzymes having cellulase, endoglucanase, cellobiohydrolase, mannanase and/or 
beta-glucosidase activity). 

The fabric or textile treatment processes of the invention can also include the use 
of any combination of other enzymes such as tryptophanases or tyrosine decarboxylases, 
laccases, catalases, laccases, other cellulases, endoglycosidases, endo-beta-l,44accases, 

10 amyloglucosidases, other glucosidases, glucose isomerases, glycosyltransferases, lipases, 
phospholipases, lipooxygenases, beta-laccases, endo-beta-l,3(4)-laccases, cutinases, 
peroxidases, amylases, glucoamylases, pecttnases, reductases, oxidases, decarboxylases, 
phenoloxidases, ligninases, puUulanases, arabinanases, hemicellulases, maimanases, 
xylolaccases, xylanases, pectin acetyl esterases, rhamnogalacturonan acetyl esterases, 

15 proteases, peptidases, proteinases, polygalacturonases, rhamnogalacturonases, 
galactanases, pectin lyases, transglutaminases, pectin methylesterases, other 
cellobiohydrolases and/or transglutaminases. 

Paper or pulp treatment 

The enzymes of the invention (e.g., enzymes having cellulase, endoglucanase, 

20 cellobiohydrolase, mannanase and/or beta-glucosidase activity) can be in paper or pulp 
treatment or paper deinking. For example, in one aspect, the invention provides a paper 
treatment process using enzymes of the invention. In one aspect, the enzymes of the 
invention can be used to modify starch in the paper thereby converting it into a liquefied 
form. In another aspect, paper components of recycled photocopied paper during 

25 chemical and enzymatic deinking processes. In one aspect, Enzymes of the iavention can 
be used in combination with other enzymes, includuig other cellulases (including other 
endoglucanases, cellobiohydrolases and/or beta-glucosidases). The wood, paper, paper 
product or pulp can be treated by the following three processes: 1) disintegration in the 
presence of an enzyme of the invention, 2) disintegration v^th a deinking chemical and an 

30 enzyme of the invention, and/or 3) disintegration after soaking with an enzyme of the 

invention. The recycled paper treated with an enzyme of the invention can have a higher 
brightness due to removal of toner particles as compared to the paper treated with just 
cellulase. While the invention is not limited by any particular mechanism, the effect of an 
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enzyme of the invention may be due to its behavior as surface-active agents in pulp 
suspension. 

The invention provides methods of treating paper and paper pulp using one or 
more polypeptides of the invention. The polypeptides of the invention can be used in any 
paper- or pulp-treating method, which are well known in the art, see, e.g., U.S. Patent No. 
6,241,849; 6,066,233; 5,582,681. For example, in one aspect, the invention provides a 
method for deinking and decolorizing a printed paper containing a dye, comprising 
pulping a printed paper to obtain a pulp slurry, and dislodging an ink from the pulp slurry 
in the presence of an enzyme of the mvention (other enzymes can also be added). In 
another aspect, the invention provides a method for enhancing the freeness of pulp, e,g,, 
pulp made from secondary fiber, by adding an enzymatic mixture comprising an enzyme 
of the invention (can also include other enzymes, e.g., pectinase enzymes) to the pulp and 
treating under conditions to cause a reaction to produce an enzymatically treated pulp. 
The freeness of the enzymatically treated pulp is mcreased from the initial freeness of the 
secondary fiber pulp without a loss in brightness. 

The paper, wood or pulp treatment or recycling processes of the invention can also 
include the use of any combination of other enzymes such as tryptophanases or tyrosine 
decarboxylases, laccases, catalases, laccases, other cellulases, endoglycosidases, endo- 
beta-l,4-Iaccases, amyloglucosidases, other glucosidases, glucose isomerases, 
glycosyltransferases, lipases, phospholipases, lipooxygenases, beta-laccases, endo-beta- 
l,3(4)-laccases, cutinases, peroxidases, amylases, glucoamylases, pectinases, reductases, 
oxidases, decarboxylases, phenoloxidases, ligninases, pullulanases, arabmanases, 
hemicellulases, mannanases, xylolaccases, xylanases, pectin acetyl esterases, 
rhamnogalacturonan acetyl esterases, proteases, peptidases, proteinases, 
polygalacturonases, rhamnogalacturonases, galactanases, pectin lyases, 
tmnsglutaminases, pectin methylesterases, other cellobiohydrolases and/or 
transglutaminases. 

Repulping: treatment oflignocellulosic materials 

The invention also provides a method for the treatment oflignocellulosic fibers, 
wherein the fibers are treated with a polypeptide of the invention (e.g., enzymes having 
cellulase, endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase activity), 
in an amount which is efficient for improving the fiber properties. The enzymes of the 
invention may also be used in the production or recycling oflignocellulosic materials 
such as pulp, paper and cardboard, from search reinforced waste paper and cardboard, 
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especially where repulping or recycling occurs at pH above 7 and where the en2ymes of 
the invention can facilitate the disintegration of the waste material through degradation of 
the reinforcing starch. The enzymes of the invention can be useful in a process for 
producing a papermaking pulp from starch-coated printed paper. The process may be 
5 performed as described in, e.g., WO 95/14807. An exemplary process comprises 
disintegrating the paper to produce a pulp, treating with a starch-degrading ejozyme 
before, during or after the dismtegrating, and separating ink particles from the pulp after 
disintegrating and enzyme treatment See also U.S. Patent No. 6,309,871 and other US 
patents cited herein. Thus, the invention includes a method for enzymatic deinking of 
10 recycled paper pulp, wherein the polypeptide is applied in an amount which is efficient 
for effective de-inking of the fiber surface. 

Brewing and fermenting 

The invention provides methods of brewing (e.g., fermentmg) beer comprising an 
enzyme of the invention, e.g., enzymes having celluiase, endoglucanase, 

15 cellobiohydrolase, mannanase and/or beta-glucosidase activity. In one exemplary 

process, starch-containing raw materials are disintegrated and processed to form a malt. 
An enzyme of the invention is used at any point in the fermentation process. For 
example, enzymes of the invention can be used m the processmg of barley malt. The 
major raw material of beer brewing is barley malt. This can be a three stage process. 

20 First, the barley grain can be steeped to increase water content, e.g., to around abont 40%. 
Second, the gram can be germinated by incubation at 15-25°C for 3 to 6 days when 
enzyme synthesis is stimulated under the control of gibberellins. During this time 
enzyme levels rise significantly. In one aspect, enzymes of the invention are added at this 
(or any other) stage of the process. The action of the enzyme results in an increase in 

26 fermentable reducing sugars. This can be expressed as the diastatic power, DP, which can 
rise from around 80 to 1 90 in 5 days at 1 2°C. 

Enzymes of the invention can be used in any beer producing process, as described, 
e.g., in U.S. Patent No. 5,762,991; 5,536,650; 5,405,624; 5,021,246; 4,788,066. 

Increasing the flow of pivduction fluids from a subterranean formation 
30 The invention also includes a method using an enzyme of the invention (e.g., 

enzymes having celluiase, endoglucanase, cellobiohydrolase, naannanase and/or beta- 
glucosidase activity), wherein the method mcreases the flow of production fluids from a 
subterranean formation by removing viscous, starch-containing, damagmg fluids formed 



wo 2006/101584 



182 



PCT/US2006/002516 



564462014240/D2150-2WO 

during production operations; these fluids can be found within the subterranean formation 
which surrounds a completed well bore. Thus, this method of the invention results in 
production fluids being able to flow from the well bore. This method of the invention 
also addresses the problem of damaging fluids reducing the flow of production fluids 
from a formation below expected flow rates. In one aspect, the invention provides for 
formulating an enzyme treatment (using an enzyme of the invention) by blending together 
an aqueous fluid and a polypeptide of the invention; pumping the enzyme treatment to a 
desired location within the well bore; allowing the enzyme tieatment to degrade the 
viscous, starch-containing, damagmg fluid, whereby the fluid can be removed from the 
subterranean formation to the well surface; and wherein the enzyme treatment is effective 
to attack the alpha glucosidic linkages in the starch-containing fluid. 

The subterranean formation enzyme treatment processes of the invention can also 
include the use of any combination of other enzymes such as ttyptophanases or tyrosine 
decarboxylases, laccases, catalases, laccases, other cellulases, endoglycosidases, endo- 
beta-l,4'-laccases, amyloglucosidases, other glucosidases, glucose isomerases, 
glycosyltransferases, lipases, phospholipases, lipooxygenases, beta-laccases, endo-beta- 
l,3(4)-laccases, cutinases, peroxidases, amylases, glucoamylases, pectinases, reductases, 
oxidases, decarboxylases, phenoloxidases, ligninases, puUulanases, arabinanases, 
hemiceliulases, mannanases, xylolaccases, xylanases, pectin acetyl esterases, 
rhamnogalacturonan acetyl esterases, proteases, peptidases, proteinases, 
polygalacturonases, rhanonogalacturonases, galactanases, pectin lyases, 
transglutammases, pectin methylesterases, other cellobiohydrolases and/or 
transglutaminases. 

Pharmaceutical compositions and dietary supplements 

The invention also provides pharmaceutical compositions and dietary supplements 
(e.g., dietary aids) comprising a cellulase of the invention (e.g., enzymes having 
endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase activity). The 
cellulase activity comprises endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase activity. In one aspect, the pharmaceutical compositions and dietary 
supplements (e.g., dietary aids) are formulated for oral ingestion, e.g., to unprove the 
digestibility of foods and feeds having a high cellulose or lignocellulosic component. 

Periodontal treatment compoimds can comprise an enzyme of the invention, e.g., 
as described in U.S. patent no, 6,776,979. Compositions and methods for the treatment or 
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prophylaxis of acidic gut syndrome can comprise an enzyme of the invention, e.g., as 
described in U.S. patent no. 6,468,964. 

In another aspect, wound dressings, implants and the like comprise antimicrobial 
(e.g., antibiotic-acting) enzymes, including an enzyme of the invention (including, e.g., 
5 exemplary sequences of the invention). Enzymes of the invention can also be used in 

algiuate dressings, antimicrobial barrier dressings, bum dressings, compression bandages, 
diagnostic tools, gel dressings, hydro-selective dressings, hydrocellular (foam) dressings, 
hydrocoUoid dressings, 1. V dressings, incise drapes, low adherent dressings, odor 
absorbing dressings, paste bandages, post operative dressings, scar management, skin 

10 care, transparent film dressings and/or wound closure. Enzymes of the invention can be 
used in wound cleansing, wound bed preparation, to treat pressure nlcers, leg ulcers, 
bums, diabetic foot ulcers, scars, IV fixation, surgical woimds and minor woimds. 
Enzymes of the invention can be used to in sterile enzymatic debriding compositions, 
e.g., ointments. In various aspects, the cellulase is formulated as a tablet, gel, pill, 

15 unplant, liquid, spray, powder, food, feed pellet or as an encapsulated formulation. 

Biodefense applications 

In other aspects, cellulases of the invention (e.g., enzymes having endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase activity) can be used in biodefense 
(e.g., destmction of spores or bacteria comprising a lignocellulosic material). Use of 

20 cellulases of the invention in biodefense appiications'offer a significant benefit, in that 
they can be very rapidly developed against any currently unknown or biological warfare 
agents of the future. In addition, cellulases of the invention can be used for 
decontamination of affected environments. In aspect, the invention provides a biodefense 
or bio-detoxifying agent comprising a polypeptide having a cellulase activity, wherein the 

25 polypeptide comprises a sequence of the invention (including, e.g., exemplary sequences 
of the invention), or a polypeptide encoded by a nucleic acid of the invention (including, 
e.g., exemplary sequences of the invention), wherein optionally tlie polypeptide has 
activity comprising endoglucanase, cellobiohydrolase, marmanase and/or beta- 
glucosidase activity, 

30 
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The following examples are offered to illustrate, but not to limit the claimed 
invention. 

10 EXAMPLES 
Example 1: GIGAMATRIX™ screen 

In one aspect, the methods of the invention use Diversa Corporation's proprietary 
GIGAMATRIX™ platform; see PCT Patent Publication No, WO 01/38583; U.S. patent 
application no. 20050046833; 20020080350; U.S. Patent No. 6,918,738; Design Patent 

15 No. D480,8 14. For example, in one aspect, GIGAMATRIX'^'^ is used in methods to 

determine if a polypeptide has cellulase activity and is within the scope of the invention, 
or, to identify and isolate a polypeptide having cellulase activity. 

A GIGAMATRIX™ platform can include an ultra-high throughput screen based 
on a 100,000 well microplate with the dimensions of a conventional 96 well plate, In this 

20 example, the GIGAMATRIX™ screen was implemented usmg 2 substrates based on 
previously shown activity by CBHs. Methyl-umbellifeiyl cellobioside (MUC) and 
methylumbeUifeiyl lactoside (MUL) were tested. Phagemid versions of the different 
clones were screened because the substrate diffuses into cells and fluorescence was 
thought to be more easily detectable. A host strain lacking, beta-galactosidase was used in 

26 order to decrease activity on the lactoside substrate. The lactoside substrate resulted in 
fewer hits and was deemed more specific tlian the cellobiose substrate. In addition, the 
lactoside substrate resulted in fewer beta-glucosidase hits. In order to test the feasibility of 
using these substrates in a screen, 14 Ubraries were chosen for screening based on the fact 
that these libraries yielded endoglucanase hits from a previous screening program. Of the 

30 libraries screened, there were a total of 50 primary hits from 11 of the Ubraries screened. 
Secondary screening consisted of plating the clones on agar plates and then colony 
picking into 384 well plates contauiing media and MUL, Active clones against MUL are 
differentiated from a background of inactive clones. Individual clones were then grown 
overnight and fluorescence was measured and the most active hits were picked for 

35 sequencing. 
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All genomic clone inserts from hits were sequenced. In general, the hits were from 
several different glycosyl hydrolase families including 1, 2, 5, 6, 10 and 16. Several other 
hits were discovered where the open reading frame was not homologous to any known 
glycosyl hydrolase families. In addition, some of the hits encoded OTP cyclohydrolase 
5 genes. 



Table 1. Summarvof GIGAMATRDC™ hits 



Enzyme 
No. 


Open Reading Frame SEO ID NO: 


nearest relevant BLAST 


1 


SEQ ID NO:22 (encoded by, e.g. SEQ ID N0:21) 


ORF 001 -family 5 (cellulase) 


la 


SEQ ID NO:24 (encoded by SEQ ID NO:23) 


ORF 003 - Family 16 + CBM 


2 


SEQ ID NO:26 (encoded by, e.g. SEQ ID NO;25) 


ORF 001 - family 1 (fi-glucosidase) 


3 


SEQ ID NO:92 (encoded by, e.g. SEQ ID N0:91) 


ORF 001 -family 3 


3a 


SEQ ID NO:94 (encoded by, e.g. SEQ ID NO:93) 


ORF 002 - alpha-riiamnosidase 


4 


SEQ ID NO:96 (encoded by, e.g. SEQ ID NO:95) 


ORF 001 -family 3 


4a 


SFD ID NO QR /"prirnHpH hv p a ^FO TD Nn*07\ 
ojjv^ lu i^\j,2fo ^C'liLrUucu uy, e.g. oi-i\i xu vi\j,y I ) 




5 


SEQ ID NO:128 (encoded by, e.g. SEQ ID NO:127) 


ORF 004 — short chain 
dehydrogenase 


5a 


SEQ ID NO:130 (encoded by, e.g. SEQ ID NO:129) 


ORF 010 -short chain 
dehydrogenase 


6 


SEQ ID N0:1 16 (encoded by, e.g. SEQ ID N0:1 15) 


ORF 004 -short chain 
dehydrogenase 


6a 


SEQ ID NO: 1 1 8 (encoded by, e.g. SEQ ID NO: 11 7) 


ORF Oil -short chain 
dehydrogenase 


7 


SEQ ID NO:40 (encoded by, e.g. SEQ ID NO:39) 


ORF 004 - putative oxidoreductase 


8 


oGv^ lu iNvJ.'fz ^^encoueo oy, e.g. axiv^ xij iNU:4i ) 


ORF 004 - cystemyl tRNA 
synthetase 


Qa 


ocki a*-' iNL^.H** tencouea oy, e.g. oJiv^ lu iNUi'tj,/ 


OP 17 01 1 _ KvnntVtfafiral rtrnf^in 

wivr VI 1 — iiypuuicucai proiciu 


0 


ocx^ iv rsKj.DH (^encoaeo uy, e.g. obQ lU rNU:5i} 


vjivT uuz — Jt^auicai oAivi lamuy 




oc\i iij iNw.u'f i^encooea oy, e.g. ottyi lu iNU.ijjj 


Wivr uuo — lamiiy i \^jj-giucosiuasej 


1 1 
i 1 


oc\i ii) iNU.Do (encoaed oy, e.g. dcQ lU NL>:j /} 


yjisjr \)\jx — supiuism iiko proiease 


12 


SEO ID NO'46 ^encoded bv e tr SFO TD Nn-4S^ 


ORF 006 — familv 1 ^li-clucosidase^ 


13 


SEQ ID N0:8 (encoded by, e.g. SEQ ID N0:7) 


ORF 003 - Isocitrate 
dehydrogenase 


13a 


SEQ ID NO:10 (encoded by, e.g. SEQ ID N0:9) 


ORF 004 - family 10 (xylanase) 


14 


SEQ ID NO:48 (encoded by, e.g. SEQ ID NO:47) 


ORF 002 - family 1 (fi-glucosidase) 


14a 


SEQ ID NO:50 (encoded by, e.g. SEQ ID NO:49) 


ORF006-fdhd/narq 
oxidoreductase 


15 


SEQ ID N0:4 (encoded by, e.g. SEQ ID N0:3) 


ORF 008 - family 1 (fi-glucosidase) 


15a 


SEQ ID N0:6 (encoded by, e.g. SEQ ID N0:5) 


ORF 012 - family 6 (cellulase) 


16 


SEQ ID NO:136 (encoded by, e.g. SEQ ID NO:135) 


ORF 001 - cellulase (glycosyl 
hydrolase family 5) 


17 


SEQ ID NO:56 (encoded by, e.g. SEQ ID NO:55) 


ORF 004 - family 1 (fi-glucosidase) 


18 


SEQ ID NO:126 (encoded by, e.g. SEQ ID NO:125) 


ORF 009 family 1 (fi-glucosidase) 


19 


SEQ ID NO:120 (encoded by, e.g. SEQ ID N0:1 19) 


ORF 002 - oxidoreductase 


19a 


SEQ ID NO:122 (encoded by, e.g. SEQ.ID N0:121) 


ORF 004 - family 5 (cellulase) 


20 


SEQ ID NO:124 (encoded by, e.g. SEQ ID NO:123) 


ORF 006 - family 1 (fi-glucosidase) 


21 


SEQ ID NO:132 (encoded by, e.g. SEQ ID N0:131) 


ORF 007 -family 5 (cellulase) 


22 


SEQ ID NO:38 (encoded by, e.g. SEQ ID NO:37) 


ORF Oil - family 1 (fi-glucosidase) 


22a 


SEQ ID NO:36 (encoded by, e.g. SEQ ID NO:35) 


ORF 007 -family 5 (cellulase) 


23 


SEQ ID NO:138 (encoded by. e.g. SEQ ID NO:137) 


ORF 001 - peptidase_M37 


24 


SEQ ID NO:146 (encoded by, e.g. SEQ ID NO:145) 


ORF 002- family 1 (fi-glucosidase) 


25 


SEQ ID NO:52 (encoded by, e.g. SEQ ID NO:51) 


ORF 00 1 - family 5 (cellulase) 
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Enzyme 


Ooen Reading Frame SEO ID NO: 


nearest relevant BLAST 




oni^ iu ri\j.z\) ^encouea oy, e.g. ocv^ lu rNCiiy^ 




jL\ja. 


onv^ iLf i\\j.io ^encouca oy, e.g. onv^ lu rH\j,i /) 


ORF DOS — R-1 a nta ma *!f» 


27 


^VCi in >Jn*1/^ (/»nrnr\p'A K\/ f» it ^iPH TFi Mri-I ^^ 

ijDv^ ilJ lWJ.y\> ^CIIUUUCU Oy, eg* OIjV,^ iLJ l^\J,xDj 


ORF 007 — familv 1 ^fi-plucosidase"^ 


97a 


SEQ ID NO;14 (encoded by, e.g. SEQ IDN0:13) 


ORF 00^ -NADH Henendent 
dehydrogenase 




SEQ ID N0:12 (encoded by, e.g. SEQ ID N0:11) 


ORF 00^ - NAD binriinp 
oxidoreductase 


28 


SEQ ID NO:28 (encoded by, e.g. SEQ ID NO:27) 


ORF 002 - family 1 (J}-glucosidase) 


29 


SEQ ID N0:1 14 (encoded by, e.g. SEQ ID N0:113) 


ORF 003 -family 10 * 


30 


SEQ ID NO:34 (encoded by, e.g. SEQ ID NO:33) 


ORF 006 -family 1 (JJ-glucosidase) 


30a 


SEQ ID NO:32 (encoded by, e.g. SEQ ID N0:31) 


ORF002-cellodextrin 
phosphorylase 


31 


SEQ ID NO:30 (encoded by, e.g. SEQ ID NO:29) 


ORF 004 - family 1 (B-glucosidase) 


32 


SEQ ID NO:100 (encoded by, e.g. SEQ ID NO:99) 


ORF 012 - family 1 (B-glucosidase) 


33 


SEQ ID NO:84 (encoded by, e.g. SEQ ID NO:83) 


ORF 008 - dehydrogenase 


34 


SEQ ID NO:102 (encoded by, e.g. SEQ ID NO:101) 


ORF 003 - family 5 (cellulase) 


35 


SEQ ID NO;140 (encoded by, e.g. SEQ ID NO:139) 


ORF 001 -threonine 
dehydrogenase 


36 


SEQ ID NO:142 (encoded by, e.g. SEQ ID N0:141) 


ORF 005 - family 1 (B-glucosidase) 


37 


SEQ ID NO:144 (encoded by, e.g. SEQ ID NO; 143) 


ORF 003 - family 1 (6-glucosidase) 


38 


SEQ ID N0:2 (encoded by, e.g. SEQ ID N0:1) 


ORF 001 - family 1 (6-glucosidase) 


39 


SEQ ID N0:86 (encoded by, e.g. SEQ ID N0:85) 


ORF 008 - family 1 (6-glucosldase) 



Abbreviations: CBM - carbohydrate binding module 



Characterization enzyme and substrate activity 

The 39 hits (see Table 1, above) discovered in the GIGAMATRK™ screen were 
first screened against cellohexaose to determine action pattern on a cellulose ohgomer. 
Genomic clones are defined as clones that have an entire DNA insert potentially 
containing multiple open reading frames. For example, in Table 1, above, one such 
genomic clone contains two open reading frames aimoted as Eir2ymes No. 22 and 22a, 
with said open reading frames having the sequences as depicted in SEQ ID NO:37 and 
SEQ ID NO:35, respectively. Another such genomic clone is contains three open reading 
frames, which are annotated as Enzymes 27, 27a and 27b. Subclones are derived from 
genomic clones and can contain only a single open reading frame. Genomic clones were 
grown overnight in TB media containing antibiotic, cells were lysed and lysates were 
clarified by centriftigation. Subclones are grown to an OD600=0.5 induced with an 
appropriate inducer and then grown an additional 3 h before lysing the cells and clarifying 
the lysate. Genomic clones will generally have less activity than a subclone, but are a 
more facile way of assessing activity in a large range of clones. Initial studies were 
performed using thin layer chromatography (TLC) for endpoint reactions usually run for 
24h. Enzymes were also tested on phosphoric acid swollen cellulose (PASC), which is 
crystalline cellulose that is made more amorphous through swelling by acid treatment. 
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A number of cellulases which were cloned from environmental libraries were 
active against PASC, but released cellobiose as well as celltriose and/or glucose. The 
genomic clones from the GIGAMATRDC'^ discovery efifort were also tested against 
PASC and on cellulosic substrates such as cellohexaose (Seikagalcu, Japan). Thin layer 
5 chromatography (TLC) experiments showed that several genomic clones were able to 

hydrolyze the cellohexaose, as illustrated in Figures 6 and 7. Of these clones, many were 
able to generate glucose as the final product which is consistent with the fact that they 
have sequence identity to glycosyl hydrolase family 1, which includes beta-glucosidases. 
Several enzymes produced cellobiose and/or larger fragments, but the exact nature of the 
1 0 product pattern could not be discerned from the TLC experiments, so a capillary 
electrophoresis (CE) method was developed. 

Example 2: Capillary Electrophoresis 

In some aspects, Capillary Electrophoresis (CE) is used in assays to screen for 
enzyme activity, e.g., CE is used in methods to determine if a polypeptide has cellulase 

15 activity and is within the scope of the invention, or, to identify and isolate a polypeptide 
having cellulase activity. Capillary Electrophoresis (CE) offers the advantages of faster 
run times and greater assay sensitivity. The CE method used l-aminopyrene-3,6,8- 
trisulfonate (APTS) as the fluorophore and was optimized for use with sugars and sugar 
oligomers (Guttman (1996) High-resolution capillary gel electrophoresis of reducing 

20 oligosaccharides labeled with l-aminopyrene-3,6,8-trisulfonate. Anal. Biochem 233:234- 
242). Enzymes that were shown to be active on cellohexaose were subjected to tests on 
phosphoric acid swollen cellulose as well as cellohexaose. Genes were subcloned, 
expressed and partially purified using a nickel-chelating column. Enzymes were 
incubated with substrate for Ih and the products were analyzed using a 10 cm or 48 cm 

25 capillary. Cellohexaose elutes at 2 and 9 minutes for the 10 and 48 cm capillaries 

respectively. The 48 cm capillary gives better separation of products in case there are low 
amoimts of sugar or if there are contaminants in the mixture. The CE method was 
implemented for studies on enzymes from the GIGAMATRIXT^ discovery that showed 
good activity on cellohexaose with TLC detection. 

30 Enzyme 22/22a (see Table, 1 above) showed good performance on PASC (data 

summarized m graph form in Figure 8), releasing mainly cellobiose. In addition, enzyme 
22/22a was able to release cellobiose from AVICEL*^ Microcrystailine Cellulose (MCC) 
(FMC Corporation, Philadelphia, PA) (data summarized in graph form in Figure 9). 
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Sequence analysis showed that enzyme 22 and enzyme 21 are -92% identical and belong 
to glycosyl hydrolase family 5. Family 5 contains mainly endoglucanases, but there are 
examples of cellobiohydrolases. CelO from Clostridmm thermocellwn has been 
characterized as a cellobiohydrolase based on activity on release of only cellobiose from 
amorphic and crystalline cellulose (Zverlov (2002) A newly described cellulosomal 
cellobiohydrolase, CelO, from Clostridium thermocellum: investigation of the exo-mode 
of hydrolysis, and binding capacity to crystalline cellulose. Microbiology 148:247-255). 

All three of these enzymes, when compared to the endoglucanase from 
Acidothermus cellulolyticus have an insertion that is in close proximity to the substrate 
binding site. This insertion could fom a loop which encloses the substrate binding site 
thus converting this enzyme from an endoglucanase to a cellobiohydrolase. When these 
enzymes were tested on cellohexaose they produced mainly cellobiose with a smaller 
amoxmt of cellotriose. These results are explained by the fact that cellobiohydolases have 
the capability to produce both cellobiose and cellotriose from a cellohexaose substrate 
(Harjunpaa (1996) Cello-oligosaccharide hydrolysis by cellobiohydrolase 11 from 
Trichoderma reesei. Association and rate constants derived from an analysis of progress 
curves. Eur. J Biochem 240:584-591). 

Example 3: Sequence Based Discovery 

The invention provides methods for identifying and isolating cellulases, e.g., 
cellobiohydrolases, usuag sequences of the invention. In one exemplary method, primers 
that were homologous to conserved regions of three glycosyl hydrolase families that 
contain cellobiohydrolases were used to screen either polynucleotide libraries or DNA 
derived from fungal samples. Primers were designed towards family 48 conserved 
regions and 96 libraries were screened resulting m 1 confirmed hit. In addition, primers 
were designed towards family 6 and family 7. Fungal libraries were screened with these 
primers, resulting iti 1 hit for family 6 and 56 hits for family 7. One of the family 7 hits 
was chosen for studies to extract the fijll length sequence. The fiiU-length sequence was 
successfully obtained and showed 73% identity to' exo-cellobiohydrolase I of Penicillhim 
Janthinellum, 

Example 4: Genetic Engineering of an Enzvme with Cellobiohydrolase Activitv 

This example described the genetic engiaeering of an exemplary enzyme of the 
invention. This enzyme can be used m the conversion of biomass to ftiels and chemicals, 
and for making effective and sustainable alternatives to petroleum-based products. This 
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enzyme can be expressed in organisms (e.g., microorganisms, such as bacteria) for its 
participation in chemical cycles involving natural biomass conversion. In one aspect, this 
enzyme is used in "enzyme ensembles" for the efficient depolymerization of cellulosic 
and hemicellulosic polymers to metabolizable carbon moieties. As discussed above, the 
5 invention provides methods for discovering and implementing the most effective of 
enzymes to enable these important new "biomass conversion" and alternative energy 
industrial processes. 

Using metagenomic discovery and a non-stochastic method of dkected evolution 
(called "DIRECTEVOLUTION®, as described, e.g., m U.S. Patent No. 6,939,689, which 
10 mcludes Gene Site Saturation Mutagenesis (GSSM) (as discussed above, see also U.S. 
Patent Nos, 6,171,820 and 6,579,258) and Tunable GeneReassembly (TGR) (see. e.g., 
U.S. Patent No. 6,537,776) technologies. This effort focused on the discovery and 
optimization of an important enzyme component for cellulose reduction to glucose, 
cellobiohydrolase. 

1 5 An enzyme discovery screen was implemented using Diversa Corporation's 

GIGAMATRDC™ high throughput expression screening platform (discussed above) to 
identify cellobiohydrolases using methylumbelliferyl cellobioside as substrate. A total of 
100 complex environmental Ubraries were screened resulting m 25 confirmed 
cellobiohydrolase hits mainly from glycosyl hydrolase families 5 and 10. These hits were 

20 pharacterized for activity against AVICEL® Microcrystallrne Cellulose (MCC) (FMC 
Corporation, Philadelphia, PA). Based on its performance characteristics, one enzyme, 
SEQ ID NO: 1 62 (encoded by, e.g., SEQ ID NO: 1 6 1) was chosen as a candidate for 
optimization using Gene Site Saturation Mutagenesis (GSSM) technology. However, 
before GSSM evolution was performed, the signal sequence (amino acids 1 through 30) 

25 was removed from SEQ ID NO: 162 and a starting methionine was added. This signal- 
free sequence, hereinafter called the "wild-type" and represented by SEQ ID NO: 164 
(encoded by, e.g., SEQ ID NO: 163), was the parental sequence that was optimized using 
GSSM technology. As discussed above, GSSM technology can rapidly mutate all amnio 
acids in the protein to the 19 other amino acids in a sequential fashion. Mutants were 

30 screened using a fiber-based assay and potential upmutants representing single amino 
acid changes were identified. These upmutants were combined into a new library 
representing combinations of the upmutants. This library was screened resulting in 
identification of several candidate enzymes for commercialization. 
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Research Summary 
GIGAMATRIX™ screen 

The GIGAMATRIX™ (GMx) screening platform is an ultra-high throughput 
method based on a 100,000 well microplate with the dimensions of a conventional 96 

5 well plate (see Phase 11 application for details). The screen works with fluorescent 
substrates. The GMx screen was implemented using 2 substrates based on previously 
shown activity by cellulases. MethylumbeUiferyl cellobioside (MUC) was used as the 
screening substrate. In addition, resorufin-beta-glucopyranoside was also included in the 
screen in order to eliminate clones that have activity on both substrates and are presumed 

10 to be beta-glucosidases. 

Amplified phage or phageraid versions of the target libraries were screened. Two 
host strains (CEH6 & GAL631) lacking beta-galactosidase genes were used in order to 
decrease endogenous host activity on the substrates, 100 libraries were chosen for 
screening based on the fact that these libraries yielded cellulase hits from a previous 

15 screening program. Of the libraries screened, there were a total of 355 primary hits from 
69 of the libraries screened. 

Secondary screening consisted of plating the clones on agar plates and then colony 
picking into 384 well plates containing media and methylumbelliferyl cellobioside 
(MUC) termed a "breakout". Figure 10 illustrates in gmphic form data showiug a typical 

20 GIGAMATRIX™ (GMx) breakout. To generate this data, active clones against MUC 
(i.e., able to hydrolyze methylumbelliferyl cellobioside) are differentiated from a 
background of inactive clones. Individual clones were then grown overnight and 
fluorescence was measured and the most active hits were picked for sequencing. In 
Figure 10, the X axis shows sample name; Y axis is relative fluorescent imits. Positive 

25 "hits" were plated onto agar plates and then colony picked into 384 well plates containing 
LB + antibiotic plus 50 nM MUC and grown overnight. 



Table 2. Summarv of GIGAMATRIX™ (GMx) hits 



Enzvme 

No. 

40 


Ooen Reading Frame SEO ID NO: 

SEQ ID NO:104 (encoded by, e.g., SEQ ID NO:103) 


Clone Family Characterization 
family 5 (cellulase) 


41 


SEQ ID >}O:108 (encoded by, e.g., SEQIDNO:I07) 


family 5 (cellulase) 


42 


SEQ ID NO: 11 2 (encoded by, e.g., SEQ ID NO: 1 1 1) 


. family 5 (cellulase) 


H7 


SEQ ID NO:60 (encoded by, e.g., SEQ ID NO:59) 


family 5 (cellulase) 


43 


SEQ ID NO:82 (encoded by, e.g., SEQ ID N0:81) 


family 5 (cellulase) 


44 


SEQ ID NO:78 (encoded by, e,g., SEQ ID NO:77) 


family 5 (cellulase) 


45 


SEQ ID NO:68 (encoded by, e.g., SEQ ID NO:67) 


family 5 (cellulase)- ORF 2 
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45a 




family 26 (mannanase) - 




SEO ID NO:70 fencoded bv e p SFO ID NO'fiO'i 




46 


SEQ ID NO:74 (encoded by, e.g., SEQ ID NO:73) 


family 10 (xylanase) 


47 


SEQ ID NO:110 (encoded by» e.g., SEQ ID NO:109) 


family 10 (xylanase) 


48 


SEQ ID NO:106 (encoded by. e.g., SEQ ID NO:105) 


family 5 (cellulase) 


49 


SEQ ID NO:66 (encoded by, e.g., SEQ ID NO:65) 


family 10 (xylanase) 


50 


SEQ ID NO:72 (encoded by, e.g., SEQ ID N0:71) 


family 5 (cellulase) 


51 


SEQ ID NO:80 (encoded by, e.g., SEQ ID NO:79) 


family 5 (cellulase) 


H8 


SEQ ID NO:62 (encoded by, e.g., SEQ ID N0:61) 


family 5 (cellulase) ORF 1 


H8a 


SEQ ID NO:64 (encoded by, e.g., SEQ ID NO:63) 


family 5 (cellulase) ORF 4 


52 


SEQ ID NO:76 (encoded by, e.g., SEQ ID NO:75) 


family 5 (cellulase) 


53 


SEQ ID NO: 160 (encoded by, e.g., SEQ ID NO: 159) 


family 10 (xylanase) 


54 


SEQ ID NO:88 (encoded by, e.g., SEQ ID NO:87) 


family 5 (cellulase) 


55 


SEQ ID NO: 148 (encoded by, e.g., SEQ ID NO: 147) 


family 10 (xylanase) 


56 


SEQ ID NO:90 (encoded by, e.g., SEQ ID NO:89) 


family 5 (cellulase) 


57 


SEQ ID NO: 152 (encoded by, e.g., SEQ ID NO: 151) 


family 5 (cellulase) 


58 


SEQ ID NO: 150 (encoded by, e.g., SEQ ID NO: 149) 


family 5 (cellulase) 


59 


SEQ ID NO:154 (encoded by, e.g., SEQ ID NO:153) 


family 5 (cellulase) 


H6 


SEQ ID NO:158 (encoded by. e.g., SEQ IDNO:157) 


family 5 (cellulase) 


60 


SEQ ID NO:156 (encoded by, e.g., SEQ ID NO:155) 


family 5 (cellulase) 



All genomic clone inserts from hits were sequenced. As with Table 1 above, 
some genomic clones contained more than one open reading frame. For example, one 
such genomic clone contains two open reading frames annoted as Enzymes No. H8 and 
H8a, with said open reading frames having the sequences as depicted in SEQ ID NO:67 
and SEQ ID NO:69, respectively. There was a total of 25 glycosyl hydrolase hits from 17 
of the libraries screened. In general, the hits were from several different glycosyl 
hydrolase families including 5 and 10. Table 2 (above) Usts the hits and their identities. 
Several other hits were discovered where the open reading frame was not homologous to 
any known glycosyl hydrolase families. In addition, some of the hits encoded GTP 
cyclohydrolase genes that are known false positives in this system as they create 
fluorescence regardless of substrate degradation. Overall the screen was successful in 
identifying enzymes that were active on MUC. 

Characterization 

Genes discovered in the GIGAMATRIX™ screen were sequenced and the data 
were analyzed. Open reading frames (ORFs) were annotated using a software system. 
The ORFs were subcloned into the appropriate vector(s) with the introduction of DNA 
encoding C-tenninal His-tags. Construct DNA was transformed into the appropriate E, 
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coli host(s) and expressed for characterization studies. The gene products were screened 
against phosphoric acid-swollen cellulose (PASC). PASC is crystalline cellulose that is 
made more amorphous through swelling by acid treatment. PASC was prepared from 
AYICEL® Microcrystalline Cellulose (MCC). Subclones were grown, expressed and 
lysed. Lysates were incubated with PASC and the reaction products were analyzed using 
the bicinchoninic acid (BCA) reducing sugar assay. The most active subclones were 
selected for larger scale growth and pmification. The specific activity of tiiese subclones 
was determined on PASC. 

The subclones were also analyzed by capillary electrophoresis (CE). Lysates were 
incubated with substrate for 30 hours. The reaction products were derivatized with the 
fluorophore l-aminopyrene-3,6,8-trisulfonate (APTS). The products were analyzed usmg 
a 48 cm capillary, Cellobiose elutes at 6 muiutes. Figure 1 1 illustrates in graph form data 
showing the activity of selected enzymes against PASC by capillary electrophoresis (CE) 
analysis. Samples H9 through HI are individual clones. In Figure 11 , a number of 
samples had reaction product profiles representative of processive enzymes. A processive 
enzyme is defined as having a ratio of cellobiose / (glucose + cellotriose) 10. Two 
potential processive enzymes that were the most active had specific activities on PASC of 
0.35 and 0.04 U/mg, respectively. 

Fungal CBBsmPichia 

Genes of newly discovered family 6 & 7 fungal cellobiohydrolases were transformed 
into R pastoris and the transformations were spread onto solid agar plates. 160 colonies 
were selected for each construct. The samples were grown and induced and the 
supematants were incubated with PASC in the presence of a B-glucosidase. The reaction 
products were analyzed using the glucose-oxidase assay. A glycosyl hydrolase family 6 
cellobiohydrolase, was successfully heterologously expressed in pastoris, 

Exo-Endo Acting Cellulase 

The wild-type enzyme, a family 9 glycosyl hydrolase discovered m an enzyme 
screen, is a homolog of Thermomonospora jusca E4. E4 has been shown to have both 
endo- and exo-activity. Initial tests of the wild-type enzyme showed it to be active on 
bofli PASC and AVICEL® MicrocrystaUine Cellulose (MCC). HPLC analysis of the 
reaction products showed the primary products to be glucose and cellobiose. The wild- 
type enzyme is a multi-domain protein which includes a glycosyl hydrolase family 9 
catalytic domain, a family 3 cellulose bmding domain, and three bacterial Ig-like domains • 
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that are believed to be involved in cell adhesion. Three additional subclone variants of 
the wild-type enzyme were tested to determine the effects of the domains on activity. The 
wild-type enzyme was subcloned with: 1) the catalytic domain alone (CD); 2) the 
catalytic and carbohydrate domain (CCD); and 3) the catalytic and carbohydrate binding 
domain plus the 11 downstream amino acids (CCD4-11). The full-length protein and the 3 
subclone variants were assayed on AVICEL® Microcrystalline Cellulose (MCC) and the 
reaction products were analyzed by the BCA reducing sugar assay, and the data is 
summarized in graphic fonn in Figure 12. The data illustrated in Figure 12 was 
generated by BCA of the wild-type enzyme and truncation mutants incubated with 
AVICEL® Microcrystalline Cellulose (MCC) for 74 hours, 37°C, pH 5. CBHl is a 
positive control. The negative control is the host without insert. 

The wild-type enzyme, the full-length protein (SEQ ID NO: 164, encoded by, e.g., 
SEQ ID NO: 1 63), was the most active. The full length protein was selected for GSSM 
evolution. The catalytic and the carbohydrate bmding domain were evolved. 
GSSM screening 

GSSM technology (discussed above) was used to rapidly and sequentially mutate 
the amino acids of the catalytic and carbohydrate binding domain of the target protein 
into the 19 other amino acids. The goal of the GSSM screen was to identify mutants that 
increased the extent of hydrolysis on insoluble microcrystalline cellulose. A robotic 
screening method was developed to facilitate the GSSM screening process. 

DNA from the mutation constructs was transformed into DHlOb host cells. 
Individual colonies were picked into 96 well (shallow) plates containmg 150 uL 
LB/Ampicillin using the automatic colony picking system. The plates were incubated for 
24 hours at 37°C, 400rpm. 15uL of culture was transfeired from each well into an 
induction plate. Each well of the induction plate contained 135 uL LB/Ampicillin with 
1 . 1 mM IPTG. The induction plates were incubated for 24 hours at 37°C, 400rpm. The 
plates were centrifuged and the supernatant was discarded. 

The automated portion of the assay began at this point. The cells were lysed and 
resuspended by the robot. 1 50uL of lysis buffer (125uL water plus 25uL BPER 
containing 0.2mg/ml lysozyme and 20 unit/ml DNase T) was added to each well. 15uL 
lysate was transferred from each well to a reaction plate. Each well of the reaction plate 
contained 185uL of a reaction mix (1% AVICEL® Microcrystalline Cellulose (MCC), 
50mM sodium acetate buffer pH5.0). The reaction plates were incubated at 37°C for 30 
hours wifli 95% humidity. After incubation, the plates were centrifuged and 15uL 
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supernatant was transferred to BCA plates. The BCA plates contained 50uL reagent A, 
50uL reagent B, and 80uL 400mM Carbonate buffer, pH 10 per well. The plates were 
covered with rubber seals and incubated at 80°C for 30 minutes, then cooled by 
centrifugation and the absorbance read at A560, 

5 Results 

At least 80 random mutation colonies were screened for each amino acid site. An 
example of the primary GSSIVF'*^ screening data is graphically illustrated in Figure 13. 
Column 6 contained the wildtype samples and column 12 contained the host/vector 
negative controls. After a 30 hour incubation with AVICEL® Microcrystalline Cellulose 

10 (MCC), the signal produced from the wildtype samples was around 0.53, with a standard 
deviation at 0.07. The negative control had an average signal at 0.29. Samples with 
signal higher than the average of positive controls plus 2 times the standard deviation 
were deemed primary hits. From this screening plate, about ten primary hits were 
selected for the secondary confirmation screening. 

15 Primary hits were reconfirmed in a secondary assay. This assay was the same as 

the primary screen. Samples were run in quadruplicate however. An example of the 
secondary GSSM screening data is graphically illustrated in Figure 14. Samples in wells 
E3-H3, A4-D4, A7-D7 on average, had higher activity than the wildtype. These 12 wells 
correspond to 3 hits since the samples were run in quadruplicate. These samples were the 

20 primary hits shown in wells E4, G2, and H3 in Figure 1 3 (plate 29805-AA89 BCA plate). 

There were 77 hits from the secondary screening. These samples were sequenced. 
Thirty five of the samples had amino acid changes, 22 had transposon insertions, and the 
rest were wildtype or had deletions. 

Hits from the secondary screen were further analyzed. The GSSM upmutants 

25 were mapped onto the crystal stmcture of T. fusca E4. Samples were prioritized based on 
amino acid location, amino acid change and the fold improvement score. Eight 
upmutants were selected from the GSSM screening and selected for gene reassembly 
evolution, i.e.. Tunable GeneReassembly (TGR), discussed above, and also see, e.g., U.S. 
Patent No. 6,537,776. 

30 Table 2. Up-mntants selected for site directed mutagenesis reassembly. 



Residue 


OLDAA 


NEWAA 


89 
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R 


103 
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G 
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no 


P 


G 


114 


Y 


L 


157 


A 


S 


481 


W 


F 


550 


P 


N 


590 


G 


R 



Blending of upmutants 

Using gene reassembly (Tunable GeneReassembly (TGR)) technology, the 
upmutants shown in Table 2, above, were blended in order to identify the candidate with 
the best activity. Activity assays were the same as for the GSSM screening except 
5 reactions were further diluted to accoxmt for increased activity of upmutants over the 
wildtype enzyme. Figure 15 illustrates in graph form data from mixed, or *'blended", 
GSSM^*^ screening assays. 

In summary, the invention provides enzymes having cellulase activity having the 



following sequences based on SEQ ID NO:164 (encoded by, e.g., SEQ ID NO:163): 



Residue 


Original 
Amino Acid 


Codons 
Encoding 
Original 
Amino Acid 


New Amino 
Acid (after 
GSSM 
Evolution) 


Codons Encoding New 
Amino Acid 


89 


M 


ATG 


R 


CGT. CGC, CGA, 
CGG^ AGA. AGG 


103 


F 


TIT, TTC 


G 


GGT. GGC, GGA, 
GGG 


110 


P 


CCA. CCC, 
CCG, OCT 


G 


GGT. GGC, GGA. 
GGG 


114 


Y 


TAT, TAC 


L 


TTA, TTG, CTT. CTC, 
CTA. CTG 


157 


A 


GCT. GCC, 
GCA, GCG 


S 


TCT. TCC, TCA, TCG, 
AGT. AGC 


481 


W 


TOG 


F 


TTT, TTC 


550 


P 


CCA, CCC, 
CCG, OCT 


N 


AAT AAC 


590 


G 


GGT, GGC, 
GGA, GGG 


R 


CGT, CGC. CGA. 
CGG, AGA, AGG 



10 

A number of aspects of the invention have been described. Nevertheless, it will be 
understood that various modifications may be made without departing from the spirit and 
scope of the invention. Accordingly, other aspects are within the scope of the following 
claims. 
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WHAT IS CLAIMED IS : 

1 . An isolated or recombinant nucleic acid comprising 

(a) a nucleic acid sequence having at least 50%, 51%, 52%, 53%, 54%, 55%, 
56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 

5 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 8 1 %, 82%, 83%, 84%, 85%, 
86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or more 
or complete sequence identity to SEQ ID NO. l, SEQ ID NO:3, SEQ ED NO:5, SEQ ID 
NO:7, SEQ ID NO:9, SEQ ID N0:1 1, SEQ ID N0:13, SEQ ID N0:15, SEQ ID N0:17, 
SEQ ID NO:19, SEQ ID N0:21, SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:27, SEQ 

10 ID NO:29, SEQ ID NO:31, SEQ ID NO:33, SEQ ID NO:35, SEQ ID NO:37, SEQ ID 
NO:39, SEQ ID N0:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID 
NO:49, SEQ ID N0:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID 
NO;59, SEQ ID N0:61, SEQ ID NO:63, SEQ ID NO:65, SEQ ID NO:67, SEQ ID 
NO:69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77, SEQ ID 

15 NO:79, SEQ ID NO:81, SEQ ID NO:83, SEQ ID NO:85, SEQ ID NO:87, SEQ ID 
NO;89, SEQ ID N0:91, SEQ ID NO:93, SEQ ID NO:95, SEQ ID NO:97, SEQ ID 
NO:99, SEQ ID NO:101, SEQ ID NO:103, SEQ ID NO: 105, SEQ ID NO:107, SEQ ID 
NO: 109, SEQ ID NO: 1 1 1, SEQ ID NO: 1 13, SEQ ID NO: 1 1 5, SEQ ID NO:117, SEQ ID 
NO:119, SEQ IDNO:121, SEQIDNO:123, SEQIDNO:125, SEQIDNO:127, SEQ ID 

20 NO:129,SEQIDNO:131,SEQIDNO:133,SEQIDNO:135,SEQIDNO:137,SEQID 
NO:139, SEQ IDN0:141, SEQ ID NO:143, SEQ IDNO:145, SEQ IDNO:147, SEQ ID 
NO:149, SEQIDN0:151, SEQIDNO:153, SEQIDNO:155, SEQIDNO:157, SEQ ID 
NO: 1 59, SEQ ID NO: 16 1, SEQ ID NO: 1 63 or SEQ ID NO: 1 65, over a region of at least 
about 20, 30, 40, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 

25 750, 800, 850, 900, 950, 1000, 1050, 1 100, 1 150 or more residues, wherein flie nucleic 
acid encodes at least one polypeptide having a cellulase activity, 

and optionally the sequence identities are determined by analysis with a sequence 
comparison algorithm or by a visual inspection; or 

(b) a nucleic acid sequence that hybridizes under stringent conditions to a nucleic 
30 acid comprising SEQ ID NO: 1, SEQ ID N0:3, SEQ ID N0:5, SEQ ID NO:7, SEQ ID 

NO:9, SEQ IDN0:11, SEQ IDN0:13, SEQ IDN0:15, SEQ IDN0:17, SEQIDN0:19, 
SEQ ID NO:21, SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:27, SEQ ID NO:29, SEQ 
ID N0:31, SEQ ID NO:33, SEQ ID NO:35, SEQ ID NO:37, SEQ ID NO:39, SEQ ID 
NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:49, SEQ ID 
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NO:51, SEQ ID NO:53, SEQ IDNO:55, SEQ ED NO:57, SEQ IDNO:59, SEQ ID 
N0:61, SEQ ID NO:63, SEQ ID NO:65, SEQ ID NO:67, SEQ ID NO:69, SEQ ID 
NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77, SEQ ID NO:79, SEQ ID 
N0:81, SEQ ID NO:83, SEQ ID NO:85, SEQ ID NO:87, SEQ ID NO:89, SEQ ID 
5 N0:91, SEQ ID NO:93, SEQ ID NO:95, SEQ ID NO:97, SEQ ID NO:99, SEQ ID 

NO:101, SEQ ID NO:103, SEQ ID NO:105, SEQ ID NO:107, SEQ ID N0:109. SEQ ID 
NO: 1 11 , SEQ ID NO: 1 1 3, SEQ ID NO: 1 1 5, SEQ ID NO: 11 7, SEQ ID NO: 1 1 9, SEQ ID 
NO: 121, SEQ ID NO:123, SEQ ID NO:125, SEQ ID NO: 127, SEQ ID NO: 129, SEQ ID 
NO:131, SEQ ID NO:133, SEQ ID NO:135, SEQ ID NO:137, SEQ ID NO:139, SEQ ID 

10 N0:141, SEQ ID NO:143, SEQ ID NO:145. SEQ ID NO: 147, SEQ ID NO: 149, SEQ ID 
NO:151, SEQ ID NO:153, SEQ ID NO:155, SEQ ID NO:157, SEQ ID NO:159, SEQ ID 
NO:161, SEQ ID NO:163 or SEQ ID NO:165, wherein the nucleic acid encodes a 
polypeptide having a cellulase activity, and the stringent conditions include a wash step 
comprising a wash in 0.2X SSC at a temperature of about 65°C for about 15 minutes, 

15 and optionally the nucleic acid is at least about 20, 30, 40, 50, 60, 75, 100, 150, 

200, 300, 400, 500, 600, 700, 800, 900, 1000 or more residues in length or the full length 
of the gene or transcript; 

(c) a nucleic acid sequence encoding a polypeptide having a sequence as set forth 
in SEQ ID NO:2, SEQ ID N0:4, SEQ ID N0:6, SEQ ID N0:8, SEQ ID NO: 10, SEQ ID 

20 N0:12, SEQ IDN0:14, SEQ ID N0:16, SEQ ID N0:18, SEQ ID NO:20, SEQ ED 
NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ED 
NO:32, SEQ ID NO:34, SEQ ID NO:36, SEQ ID NO:38, SEQ ID NO:40, SEQ ED 
NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID NO:50, SEQ ID 
NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID NO:60, SEQ ID 

25 NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ ID NO:68, SEQ ID NO:70, SEQ ID 
NO: 72, SEQ ID NO:74, SEQ ID NO:76, SEQ ID NO:78, SEQ ID NO:80, SEQ ID 
NO:82, SEQ ID NO:84, SEQ ID NO:86, SEQ ID NO:88, SEQ ID NO:90, SEQ ID 
NO:92, SEQ ID NO:94, SEQ ID NO:96, SEQ ED NO:98, SEQ ID NO: 100, SEQ ID 
NO: 102, SEQ ID NO: 104, SEQ ID NO: 106, SEQ ID NO: 108, SEQ ED NO: 1 10, SEQ ID 

30 N0:1 12, SEQ ID N0:1 14, SEQ ID N0:116, SEQ ID N0:1 18, SEQ ID NO:120, SEQ ED 
NO: 122, SEQ ID NO: 124, SEQ ID NO:126, SEQ ID NO:128, SEQ ID NO:130, SEQ ID 
NO:132, SEQ IDNO:134, SEQIDNO:136, SEQ EDNO:138, SEQ ID NO:140, SEQ ID 
NO:142, SEQIDNO:143, SEQIDNO:146, SEQIDNO:148, SEQIDNO:150, SEQ ED 
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NO:152, SEQ ID NO:154, SEQ ID NO:156, SEQ ID NO:158, SEQ ID NO-.160, SEQ ID 
NO: 162, SEQ ID NO: 164 or SEQ ID NO: 166; or 

(d) a nucleic acid sequence complementary to (a), (b) or (c). 

2. The isolated or recombinant nucleic acid of claim 1, wherein the nucleic 

5 acid sequence comprises a sequence as set forth in SEQ ID NO: 1, SEQ ID N0:3, SEQ ID 
N0:5, SEQIDNO:?, SEQIDN0:9, SEQIDNO:ll, SEQIDNO:13, SEQIDN0:15, 
SEQ ID NO:17, SEQ ID NO:19, SEQ ID N0:21, SEQ ID NO:23, SEQ ID N0:25, SEQ 
ID NO:27, SEQ ID NO:29, SEQ ID NO:3 1, SEQ ID NO:33, SEQ ID NO:35, SEQ ID 
NO:37, SEQ ID NO:39, SEQ ID N0:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID 

10 NO:47, SEQ ID NO:49, SEQ ID N0:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID 
NO:57, SEQ ID NO:59, SEQ ID N0:61, SEQ ID NO:63, SEQ ID NO:65, SEQ ID 
NO:67, SEQ ID NO:69, SEQ ID N0:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID 
NO:77, SEQ ID NO:79, SEQ ID N0:81, SEQ ID NO:83, SEQ ID NO:85, SEQ ED 
NO:87, SEQ ID NO:89, SEQ ID N0:91, SEQ ID NO:93, SEQ ID N0:95, SEQ ID 

15 NO:97, SEQ ID NO:99, SEQ ID NO:101, SEQ ID NO:103, SEQ m NO:105, SEQ ID 

NO:107, SEQ ID NO:109, SEQ ID NO:lll, SEQ ID N0:113, SEQ ID N0:I15, SEQ ID 
NO: 1 17, SEQ ID NO:l 19, SEQ ID N0:121, SEQ ID NO: 123, SEQ ID NO: 125, SEQ ID 
NO:127, SEQ ID NO:129, SEQ IDN0:131, SEQ ID NO:133, SEQ ID NO:135, SEQ ID 
NO:137, SEQ ID NO:139, SEQ ID N0:141, SEQ IDNO:143, SEQ ID NO:145, SEQ ID 

20 NO:147, SEQ IDNO:149, SEQ ID N0:151, SEQ ID NO:153, SEQ ID NO:155, SEQ ID 
NO:157, SEQ ID NO:159, SEQ ID N0:161, SEQ ID NO:163 or SEQ ID NO:165. 

3 . The isolated or recombinant nucleic acid of claim 1, wherein the sequence 
comparison algorithm is a BLAST version 2.2.2 algorithm where a filtering setting is set 
to blastall -p blastp -d "nr pataa" -F F, and all other options are set to default. 

25 4. The isolated or recombinant nucleic acid of claim 1, wherein the cellulase 

activity comprises an endoglucanase activity. 

5. The isolated or recombinant nucleic acid of claim 1, wherein the cellulase 
activity comprises a ceUobiohydrolase activity. 

6 . The isolated or recombinant nucleic acid of claim 1 , wherein the cellulase 
30 activity comprises an P-glucosidase or mannanase activity. 

7. The isolated or recombinant nucleic acid of claim 1 , wherein the cellulase 
activity comprises an endoceUulase activity. 
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8. The isolated or recombinant nucleic acid of claim 1, wherein the cellulase 
activity comprises hydrolyzing a glucan to produce a smaller molecular weight 
polysaccharide or oligomer. 

9. The isolated or recombinant nucleic acid of claun 1, wherein the cellulase 
5 activity comprises catalyzing hydrolysis of 1,4-beta-D-glycosidic linkages. 

1 0 . The isolated or recombinant nucleic acid of claim 9, wherein the 
endocellulase activity comprises an endo-l,4-beta-endocellulase activity. 

1 1 . The isolated or recombinant nucleic acid of claun 1 0, wherein the 1 ,4- 
beta-D-glycosidic linkage activity comprises hydrolysis of a 1,4-beta-D-glycosidic 

10 linkage in a celMose, a cellulose derivative, a lichenin or a cereal. 

1 2 . The isolated or recombinant nucleic acid of claim 1 1 , wherein the cellulose 
derivative comprises a carboxy methyl cellulose or a hydroxy ethyl cellulose. 

13. The isolated or recombinant nucleic acid of claim 11, wherein the cereal 
comprises a beta-D-glucan or a xyloglucan. 

15 14, The isolated or recombinant nucleic acid of claim 1, wherein the cellulase 

activity comprises catalyzing hydrolysis of glucanase linkages. 

1 5 . The isolated or recombinant nucleic acid of claim 1 4, wherein the cellulase 
activity comprises catalyzing hydrolysis of P-1,4- and/or P-1,3- glucanase linkages, 

16. The isolated or recombinant nucleic acid of claim 14, wherein the cellulase 
20 activity comprises catalyzing hydrolysis of endo-glucanase linkages. 

17. The isolated or recombinant nucleic acid of claim 16, wherein the cellulase 
activity comprises catalyzing hydrolysis of endo-l,4-beta-D-glucan 4-glucano hydrolase 
activity. 

18. The isolated or recombinant nucleic acid of claim 16, wherein the cellulase 
26 activity comprises catalyzing hydrolysis of internal endo- p-1,4- glucanase linkages 

and/or P-1,3- glucanase linkages. 

19. The isolated or recombinant nucleic acid of claim 1 , wherein the cellulase 
activity comprises catalyzing hydrolysis of internal p-l,3-glucosidic linkages. 

20. The isolated or recombinant nucleic acid of claim 1, wherein the cellulase 
30 activity comprises hydrolyzing polysaccharides comprising glucopyranose. 

2 1 . The isolated or recombinant nucleic acid of claim 20, wherein the 
cellulase activity comprises hydrolyzing polysaccharides comprising 1,4-p-glycoside- 
linked D-glucopyranoses. 
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22. The isolated or recombinant nucleic acid of claim 1, wherein the cellulase 
activity comprises hydrolyzing a cellulose, a cellulose derivative or a hemicellulose. 

23 . The isolated or recombinant nucleic acid of claim 22 wherein the celMase 
activity comprises hydrolyzing a cellulose or a hemicellulose in a wood or paper pulp or a 
wood or paper product. 

24. The isolated or recombinant nucleic acid of claim 1, wherein the cellulase 
activity comprises catalyzing hydrolysis of glucan in a feed, a food product or a beverage. 

25. The isolated or recombinant nucleic acid of claim 24 wherein the feed, 
food product or beverage comprises a cereal-based animal feed, a wort or a beer, a dough, 
a fruit or a vegetable. 

26. The isolated or recombinant nucleic acid of claim 1, wherein the cellulase 
activity comprises catalyzmg hydrolysis of a glucan m a microbial cell, a fungal cell, a 
mammalian cell, a plant cell or any plant material comprising a cellulosic part. 

27. The isolated or recombinant nucleic acid of claim 1, wherein the cellulase 
activity is thermostable. 

28. The isolated or recombinant nucleic acid of claim 27, wherein the 
polypeptide retains a cellulase activity under conditions comprising a tempemture range 
of between about 37°C to about 95^*0, or between about 55'*C to about 85°C, or between 
about 70^C to about 75°C, or between about 70''C to about 95''C, or between about 90°C 
to about 95°C, or retains a cellulase activity in a temperature in the range between about 
rC to about 5°C, between about 5**C to about 15°C, between about 15*^0 to about 25°C, 
between about 25T to about 37°C, or between about 37''C to about 95''C, 96°C, 97^0, 
98°C or 99°C. 

29. The isolated or recombinant nucleic acid of claim 1, wherein the cellulase 
activity is thennotolerant. 

30. The isolated or recombinant nucleic acid of claim 29, wherein the 
polypeptide retains a cellulase activity after exposure to a temperature in the range from 
greater than 37°C to about 95°C, from greater than 55°C to about 85°C, or between about 
70°C to about 75*^0, or from greater than 90^0 to about 95''C, or after exposure to a 
temperature in the range between about TC to about 5°C, between about 5**C to about 
i5**C, between about IS^C to about 25*^0, between about 25^C to about 37°C, or between 
about 37°C to about 95°C, 96°C, 97°C, 98°C or 99°C. 

31. A nucleic acid probe for identifying a nucleic acid encoding a polypeptide 
with a cellulase activity, wherem the probe comprises at least 20, 30, 40, 50, 60, 75, 100 
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or 150 or more consecutive bases of a sequence as set forth in claim 1, wherein the probe 
identifies the nucleic acid by binding or hybridization, 

wherein optionally the probe comprises an oligonucleotide comprising at least 
about 10 to 50, about 20 to 60, about 30 to 70, about 40 to 80, about 60 to 100, or about 
5 50 to 150 consecutive bases, 

wherein optionally the probe comprises consecutive bases of a sequence as set 
forfli in SEQ ED NO: 1, SEQ ID N0:3, SEQ ID N0:5, SEQ ID NO:7, SEQ ID N0:9, SEQ 
ID N0:11, SEQ ID N0:13, SEQ ID NO:15. 

32. An amplification primer pair for ampHfying a nucleic acid encoding a 
10 polypeptide having a cellulase activity, wherein the amplification primer pair 

(a) is capable of amplifying a nucleic acid comprising a sequence as set forth in 
claim 1, or a subsequence thereof; or 

(b) comprises a first member having a sequence as set forth by about the first (the 
5') 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or more 

15 residues of SEQ ID NO: 1, SEQ ED NO:3, SEQ ID NO:5, SEQ ID NO:7, SEQ ID NO:9, 
SEQ ID NO: 1 1, SEQ ID NO: 13, SEQ ID NO:15, SEQ ID NO:17, SEQ ID NO: 19, SEQ 
ID N0:21, SEQ ID NO:23, SEQ ID NO:25, SEQ ID NO:27, SEQ ID NO:29, SEQ ID 
NO:3 1, SEQ ID NO:33, SEQ ED NO:35, SEQ ID NO:37, SEQ ID NO:39, SEQ ID 
NO:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID NO:47, SEQ ID NO:49, SEQ ID 

20 NO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID NO:57, SEQ ID NO:59, SEQ ID 
N0:61, SEQ ID NO:63, SEQ ID NO:65, SEQ ED NO:67, SEQ ID NO:69, SEQ ID 
NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID NO:77, SEQ ID NO:79, SEQ ID 
NO:81, SEQ ID NO:83, SEQ ID NO:85, SEQ ID NO:87, SEQ ID NO:89, SEQ ID 
NO:91, SEQ ID NO:93, SEQ ED NO:95, SEQ ID NO:97, SEQ ID NO:99, SEQ ID 

25 NO:101, SEQ ID NO:103, SEQ ID NO:105, SEQ IDNO:107, SEQ ID NO:109, SEQ ID 
NO: 1 1 1, SEQ ID N0:1 13, SEQ ID NO:l 1 5, SEQ ID NO: 1 17, SEQ ID NO: 1 19, SEQ ID 
NO: 121, SEQ ID NO:123, SEQ ID NO:125, SEQ ID NO: 127, SEQ ID NO: 129, SEQ ID 
NO: 131, SEQ ID NO:133, SEQ ID NO:135, SEQ ED NO: 137, SEQ ED NO: 139, SEQ ID 
NO:141, SEQ ID NO:143, SEQ ID NO:145, SEQ IDNO:147, SEQ ID NO:149, SEQ ID 

30 NO: 151, SEQ ID NO: 153, SEQ ID NO: 155, SEQ ID NO: 157, SEQ ID NO: 159, SEQ ID 
NO: 161, SEQ ID NO:163 or SEQ ID NO:165, and a second member having a sequence 
as set forth by about the fixst (the 5') 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 
26, 27, 28, 29, 30 or more residues of the complementary strand of the first member. 
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wherein optionally a member of the amplification primer pair comprises an 
oligonucleotide comprising at least about 10 to 50 consecutive bases of the sequence, or, 
about 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 ormore 
consecutive bases of the sequence. 
5 33, A cellulase-encoding nucleic acid generated by ampHfication of a 

polynucleotide using an amplification primer pair as set forth in claim 32, wherein 
optionally the ampUfication is by polymerase chain reaction (PGR). 

34. The cellulase-encoding nucleic acid of claim 33, wherein the nucleic acid 
generated by amplification of a gene library, and optionally the gene library is an 

1 0 environmental library. 

35. An isolated or recombinant cellulase encoded by the cellulase-encoding 
nucleic acid set forth in claim 33. 

36. A method of amplifying a nucleic acid encoding a polypeptide having a 
cellulase activity comprising amplification of a template nucleic acid with an 

1 5 amplification primer pair as set forth in claim 32. 

37. An expression cassette comprising a nucleic acid comprising a sequence as 
set forth in claim 1 . 

38. A vector comprising a nucleic acid comprising a sequence as set forth in 
claim 1, wherein optionally the vehicle comprises an expression vector. 

20 39. A cloning vehicle comprising a nucleic acid comprising a sequence as set 

forth in clahn 1, 

wherein optionally the cloning vehicle comprises a viral vector, a plasmid, a 
phage, a phagemid, a cosmid, a fosmid, a bacteriophage or an artificial chromosome, 

and optionally the viral vector comprises an adenovirus vector, a retroviral vector 
25 or an adeno-associated viral vector, and optionally the cloning vehicle comprises a 
bacterial artificial chromosome (BAG), a plasmid, a bacteriophage Pl-derived vector 
(PAC), a yeast artificial chromosome (YAC), or a mammalian artificial chromosome 
(MAC). 

40. A transformed cell comprisiag a nucleic acid comprising a sequence as set 
30 forth ia claim 1, or an expression cassette as set forth in claim 37, the vector of claun 38, 
or a cloning vehicle as set forth in claim 39, 

wherein optionally the cell is a bacterial cell, a mammalian cell, a fungal cell, a 
yeast cell, an insect cell or a plant cell. 
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41. A transgenic non-human animal comprising a sequence as set forth in 
claim 1, wherein optionally the transgenic non-himian animal is a mouse or a rat. 

42. A transgenic plant comprising a sequence as set forth in claim 1, 
wherein optionally the plant is a com plant, a sorghum plant, a potato plant, a 

tomato plant, a wheat plant, an oilseed plant, a rapeseed plant, a soybean plant, a rice 
plant, a barley plant, a grass, or a tobacco plant 

43. A transgenic seed comprising a sequence as set forth in claim 1, 
wherem optionally the seed is a com seed, a wheat kemel, an oilseed, a rapeseed, 

a soybean seed, a pahn kemel, a sunflower seed, a sesame seed, a rice, a barley, a peanut 
or a tobacco plant seed. 

44. An antisense oligonucleotide comprising a nucleic acid sequence 
complementary to or capable of hybridizing under stringent conditions to a sequence as 
set forth in claim 1, or a subsequence thereof 

wherein optionally the antisense oUgonucleotide has a length of between about 10 
to 50, about 20 to 60, about 30 to 70, about 40 to 80, or about 60 to 100 bases. 

45. A method of mhibiting the translation of a cellulase message in a cell 
comprising administering to the cell or expressing in the cell an antisense oligonucleotide 
comprising a nucleic acid sequence complementaiy to or capable of hybridizing under 
stringent conditions to a sequence as set forth in claim 1. 

46. A double-stranded interference RNA (RNAi) molecule comprising a 
subsequence of a sequence as set forth in claim 1, 

wherein optionally the RNAi comprises an siRNA or an miRNA, and optionally 
the RNAi molecule is about 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 or 
more duplex nucleotides in length. 

47. A method of inhibiting the expression of a cellulase in a cell comprising 
administering to the cell or expressing in the cell a double-stranded interference RNA 
(RNAi) molecule as set forth in claim 46. 

48. An isolated or recombinant polypeptide 

(i) having an amino acid sequence having at least 50%, 51%, 52%, 53%, 54%, 
55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 
70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 
85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 
more, or 100% sequence identity to SEQ ID N0:2, SEQ ID N0:4, SEQ ID N0:6, SEQ 
ID N0:8, SEQ ID NO; 10, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID NO: 16, SEQ ID 
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NO: 1 8, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID 
NO:28, SEQ ID NO:30, SEQ ID NO:32, SEQ ID NO:34, SEQ ID NO:36, SEQ ID 
NO:38, SEQ ID NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID 
NO:48, SEQ ID NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID 
NO:58, SEQ ID NO:60, SEQ ID NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ ID 
NO:68, SEQ ID NO:70, SEQ ID NO:72, SEQ ID NO:74, SEQ ID NO:76, SEQ ID 
NO:78, SEQ ID NO:80, SEQ ID NO:82, SEQ ID NO:84. SEQ ID NO:86, SEQ ID 
NO: 88, SEQ ID NO:90, SEQ.ID NO:92, SEQ ID NO:94, SEQ ID NO:96, SEQ ID 
NO:98, SEQ ID NO:100, SEQ ID NO: 102, SEQ ID NO:104, SEQ ID NO:106, SEQ ID 
NO: 108, SEQ ID NO: 1 10, SEQ ID N0:1 12, SEQ ID N0:1 14, SEQ ID NO:l 16, SEQ ID 
NO: 1 18, SEQ ID NO:120, SEQ ID NO: 122, SEQ ID NO: 124, SEQ ID NO:126, SEQ ID 
NO:128, SEQ ED NO:130. SEQ ID NO:132, SEQ ID NO:134, SEQ ID NO:136, SEQ ID 
NO:138, SEQ ID NO:140, SEQ IDNO:142, SEQ ID NO:143, SEQ ID NO:146, SEQ ID 
NO: 1 48, SEQ ID NO: 1 50, SEQ ID NO: 1 52, SEQ ID NO: 1 54, SEQ ID NO: 156, SEQ ID 
NO: 158, SEQ ID NO: 160, SEQ ID NO: 162, SEQ ID NO: 164 or SEQ ID NO: 166, over a 
region of at least about 20, 25, 30, 35, 40, 45, 50, 55, 60, 75, 100, 150, 200, 250, 300 or 
more residues, 

whereia optionally the sequence identities are determined by analysis with a 
sequence comparison algorithm or by a visual inspection, and optionally the sequence 
comparison algorithm is a BLAST version 2.2.2 algorithm where a filtering setting is set 
to blastall -p blasip -d "nr pataa" -F F, and all other options are set to default; 

(ii) having an amino acid sequence encoded by a nucleic acid as set forth in claim 
1, wherein the polypeptide has a cellulase activity or has immunogenic activity in that it is 
capable of generating an antibody that specifically binds to a polypeptide having a 
sequence as set forth in SEQ ID N0:2, SEQ ID N0:4, SEQ ID N0:6, SEQ ID N0:8, 
SEQ ID NO:10, SEQ IDNO:12, SEQ ID NO:14, SEQ IDNO:16, SEQ ID N0:18, SEQ 
ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ ID 
NO:30, SEQ ID NO:32, SEQ ID NO:34, SEQ ID NO:36, SEQ ID NO:38, SEQ ID 
NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID 
NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID 
NO:60, SEQ ID NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ ID NO:68, SEQ ID 
NO:70, SEQ ID NO:72, SEQ ID NO:74, SEQ ID NO:76, SEQ ID NO:78, SEQ ID 
NO:80, SEQ ID NO:82, SEQ ID NO:84, SEQ ID NO:86, SEQ ID NO:88, SEQ ID 
NO:90, SEQ ID NO:92, SEQ ID NO:94, SEQ ID NO:96, SEQ ID NO:98, SEQ ID 
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NO: 100, SEQ ID NO: 102, SEQ ID NO: 104, SEQ ID NO: 106, SEQ ID NO: 108, SEQ ID 
NO: 1 10, SEQ ID NO: 112, SEQ ID NO: 1 14, SEQ ID NO: 1 16, SEQ ID NO: 1 1 8, SEQ ID 
NO: 120, SEQ ID NO: 122, SEQ ID NO: 124, SEQ ID NO: 126, SEQ ID NO: 128, SEQ ID 
NO:130, SEQ IDNO:132, SEQIDNO:134, SEQ IDNO:136, SEQ IDNO:138, SEQ ID 
5 NO: 140, SEQ ID NO: 142, SEQ ED NO: 143, SEQ ID NO: 146, SEQ ID NO: 148, SEQ ID 
NO:150, SEQIDNO:152, SEQIDNO:154, SEQIDNO:156, SEQIDNO:158, SEQ ID 
NO: 160, SEQ ID NO: 162, SEQ ID NO: 164 or SEQ ID NO: 166; or 

(iii) having an amino acid sequence as set forth in (i) or (ii), or a polypeptide 
encoded by a nucleic acid as set forth in claim 1, and comprising at least one amino acid 
10 residue conservative substitution, 

wherein optionally conservative substitution comprises replacement of an 
aliphatic amino acid with another aliphatic amino acid; replacement of a serine with a 
threonine or vice versa; replacement of an acidic residue with another acidic residue; 
replacement of a residue bearing an amide group with another residue bearing an amide 
15 group; exchange of a basic residue with another basic residue; or, replacement of an 
aromatic residue with another aromatic residue, or a combination thereof, 

and optionally the aliphatic residue comprises Alanine, VaUne, Leucine, 
Isoleucine or a synthetic equivalent thereof; the acidic residue comprises Aspartic acid. 
Glutamic acid or a synthetic equivalent thereof; the residue comprising an amide group 
20 comprises Aspartic acid. Glutamic acid or a synthetic equivalent thereof; the basic residue 
comprises Lysme, Arginine or a synthetic equivalent thereof; or, the aromatic residue 
comprises Phenylalanine, Tyrosine or a synthetic equivalent thereof. 

49. The isolated or recombinant polypeptide of claim 48, wherein the cellulase 
activity comprises an endoglucanase activity. 
25 50. The isolated or recombinant polypeptide of claim 48, wherein the cellulase 

activity comprises a cellobiohydrolase activity, 

5 1 . The isolated or recombinant polypeptide of claim 48, wherein the 
cellulase activity comprises an p-glucosidase or mannanase activity. 

52. The isolated or recombinant polypeptide of claim 48, wherein the cellulase 
30 activity comprises an endocellulase activity. 

53. The isolated or recombinant polypeptide of claim 48, whereui the cellulase 
activity comprises hydrolyzing a glucan to produce a smaller molecular weight 
polysaccharide or oligomer. 
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54. The isolated or recombinant polypeptide of claim 48, wherein the cellulase 
activity comprises catalyziog hydrolysis of 1,4-beta-D-glycosidic linkages. 

55. The isolated or recombinant polypeptide of claim 54, wherein the 
endocellulase activity comprises an endo-l,4-beta-endoceIlulase activity. 

56. The isolated or recombinant polypeptide of claim 54, wherein the 1 ,4-beta- 
D-glycosidic linkage activity comprises hydrolysis of a 1 ,4-beta-D-glycosidic linkage in a 
cellulose, a cellulose derivative, a lichenin or a cereal. 

57. The isolated or recombinant polypeptide of claim 56, wherein the cellulose 
derivative comprises a carboxy methyl cellulose or a hydroxy ethyl cellulose. 

58. The isolated or recombinant polypeptide of claim 56, wherein the cereal 
comprises a beta-D-glucan or a xyloglucan. 

59. The isolated or recombinant polypeptide of claim 48, wherein the cellulase 
activity comprises catalyzing hydrolysis of glucanase linkages. 

60. The isolated or recombinant polypeptide of claim 59, wherein the cellulase 
activity comprises catalyzing hydrolysis of P-1,4- and/or P-1,3- glucanase linkages. 

6 1 . The isolated or recombinant polypeptide of claim 59, wherein the cellulase 
activity comprises catalyzing hydrolysis of endo-glucanase linkages. 

62. The isolated or recombinant polypeptide of claim 61, wherein the cellulase 
activity comprises catalyzing hydrolysis of endo-l,4-beta-D-glucan 4-glucano hydrolase 
activity. 

63 . The isolated or recombinant polypeptide of claim 6 1 , wherein the cellulase 
activity comprises catalyzing hydrolysis of internal endo- P-1,4- glucanase linkages 
and/or P-1,3- glucanase linkages. 

64. The isolated or recombinant polypeptide of claim 48, wherein the cellulase 
activity comprises catalyzing hydrolysis of internal p-l,3-glucosidic linkages. 

65. The isolated or recombinant polypeptide of claim 48, wherem the cellulase 
activity comprises hydrolyzing polysaccharides comprising glucopyranose. 

66. The isolated or recombinant polypeptide of claim 65, wherein the 
cellulase activity comprises hydrolyzing polysaccharides comprising 1,4-p-glycoside- 
linked D-glucopyranoses. 

67. The isolated or recombinant polypeptide of claim 48, wherein the cellulase 
activity comprises hydrolyzing a cellulose, a cellulose derivative or a hemicellulose. 
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68. The isolated or recombinant polypeptide of claim 67, wherein the cellulase 
activity comprises hydrolyzing a cellulose or a hemicellulose m a wood or paper pulp or a 
wood or paper product, 

69. The isolated or recombmant polypeptide of claim 48, wherein the cellulase 
5 activity comprises catalyzing hydrolysis of glucan in a feed, a food product or a beverage. 

70. The isolated or recombinant polypeptide of claun 69, wherein the feed, 
food product or beverage comprises a cereal-based animal feed, a wort or a beer, a dough, 
a fruit or a vegetable. 

71. The isolated or recombinant polypeptide of claim 48, wherein the cellulase 
10 activity comprises catalyzing hydrolysis of a glucan in a microbial cell, a fungal cell, a 

mammalian cell, a plant cell or any plant material comprising a cellulosic part. 

72. The isolated or recombinant polypeptide of claim 48, wherein the 
cellulase activity is thermostable. 

73. The isolated or recombinant polypeptide of claim 72, wherein the 

15 polypeptide retains a cellulase activity under conditions comprising a temperature range 
of between about 37°C to about 95°C, or between about 55°C to about 85°C, or between 
about 70°C to about 75°C, or between about 70°C to about 95*^0, or between about 90^C 
to about 95°C, or retains a cellulase activity in a temperature in the range between about 
rc to about 5°C, between about 5°C to about WC, between about 15^C to about 25**C, 

20 between about 25°C to about 37^C, or between about 37**C to about 95°C, 96°C, 97°C, 
98"C or 99°C. 

74. The isolated or recombinant polypeptide of claim 48, wherem the cellulase 
activity is thermotolerant 

75. The isolated or recombinant polypeptide of claim 74, wherein the 

25 polypeptide retains a cellulase activity after exposure to a temperature in the range from 
greater than 37°C to about 95°C, from greater than 55°C to about 85°C, or between about 
70°C to about 75°C, or from greater tiian 90*^0 to about 95'*C, or after exposure to a 
temperature in the range between about TC to about 5°C, between about 5^C to about 
WC, between about 15°C to about 25°C, between about 25°C to about 37^C, or between 

30 about 37°C to about 95°C, 96°C, 97°C, 98°C or 99°C. 

76. The isolated or recombinant polypeptide comprising a polypeptide as set 
forth in claim 48 and lacking a signal or leader sequence or a prepro sequence. 
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77. An isolated or recombinant polypeptide comprising a polypeptide as set 
forth in claim 48 and having a heterologous signal or leader sequence or a heterologous 
prepro sequence. 

78. The isolated or recombinant polypeptide of claim 48, wherein the cellulase 
5 activity comprises a specific activity at about ZT'C in the range from about 100 to about 

1000 units per milligram of protein, from about 500 to about 750 xmits per milligram of 
protein, from about 500 to about 1200 units per milligram of protein, or from about 750 to 
about 1000 units per milligram of protein. 

79. The isolated or recombinant polypeptide of claim 48, wherein the 

1 0 thermotolerance comprises retention of at least half of the specific activity of the cellulase 
at 37^*0 after being heated to an elevated temperature, or, wherein the thermotolerance 
comprises retention of specific activity at 3TC in the range from about 500 to about 1200 
imits per milligram of protein after being heated to an elevated temperature. 

80. The isolated or recombinant polypeptide of claim 48, wherein the 

15 polypeptide comprises at least one glycosylation site, and optionally the glycosylation is 
an N-hnked glycosylation, and optionally the polypeptide is glycosylated after being 
expressed in a /*. pastoris or a S. pombe, 

8 1 . The isolated or recombinant polypeptide of claim 48, wherein the 
polypeptide retains a cellulase activity under conditions comprising about pH 6.5, pH 6.0, 

20 pH 5.5, 5.0, pH 4.5 or 4.0 or more acidic, or after exposure to conditions comprising 
about pH 6.5, pH 6.0, pH 5.5, 5.0, pH 4.5 or 4.0 or more acidic. 

82. The isolated or recombinant polypeptide of claim 48, wherein the 
polypeptide retains a cellulase activity imder conditions comprising about pH 7.5, pH 8.0, 
pH 8,5, pH 9, pH 9.5, pH 10 or pH 10,5 or more basic, or after exposure to conditions 

25 comprismg about pH 7,5, pH 8,0, pH 8,5, pH 9, pH 9.5, pH 10 or pH 10.5 or more basic. 

83 . A protein preparation comprising a polypeptide as set forth in claim 48, 
wherein the protein preparation comprises a liquid, a solid or a gel. 

84. A heterodimer comprising a polypeptide as set forth in claim 48 and a 
second domain, wherein optionally the second domain is a polypeptide and the 

30 heterodimer is a fiision protein, and optionally the second domain comprises an epitope, 
an immunogenic peptide or a tag. 

85. A homodimer comprising a polypeptide as set forth in claim 48. 

86. An immobilized polypeptide or an immobiUzed nucleic acid, wherein the 
polypeptide comprises a sequence as set forth in claim 48, or a subsequence thereof, or 
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the nucleic acid comprises a sequence as set forth in claim 1, or a subsequence thereof, or 
the probe as set forth in claim 31, wherein optionally the polypeptide or nucleic acid is 
immobilized on a cell, a metal, a resin, a polymer, a ceramic, a glass, a microelectrode, a 
graphitic particle, a bead, a gel, a plate, an array or a capillary tube. 
5 87. An array comprising an immobilized polypeptide as set forth in claim 86, 

or, an immobiUzed nucleic acid as set forth in claim 86. 

88. An isolated or recombinant antibody that specifically binds to a 
polypeptide as set forth in claim 48, wherein optionally the antibody is a monoclonal or a 
polyclonal antibody. 

10 89 . A hybridoma comprising an antibody that specifically binds to a 

polypeptide as set forth in claim 48. 

90. A method of isolating or identifying a polypeptide with a cellulase activity 
comprising the steps of: 

(a) providing an antibody as set forth in claim 88; 

15 (b) providing a sample comprising polypeptides; and 

(c) contacting the sample of step (b) with the antibody of step (a) under conditions 
wherein the antibody can specifically bind to the polypeptide, thereby isolating or 
identifying a polypeptide having a cellulase activity, 

91. A method of making an anti-cellulase antibody comprising 

20 (a) administering to a non-himian animal a nucleic acid as set fordi in claim 1 or a 

subsequence thereof in an amount sufficient to generate a humoral immime response, 
thereby making an anti-cellulase antibody, or 

(b) administering to a non-human animal a polypeptide as set forth in claim 48 or 
a subsequence thereof in an amount sufficient to generate a himioral immune response, 

25 thereby making an anti-cellulase antibody. 

92. A method of producuig a recombinant polypeptide comprising the steps of: 
(a) providing a nucleic acid operably linlced to a promoter, wherein the nucleic acid 
comprises a sequence as set forth in claim 1 ; and (b) expressing the nucleic acid of step 
(a) xmder conditions that allow expression of the polypeptide, thereby producing a 

30 recombinant polypeptide. 

wherein optionally tiie method further comprises transforming a host cell with the 
nucleic acid of step (a) followed by expressing the nucleic acid of step (a), thereby 
producing a recombinant polypeptide in a transformed cell. 
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93 . A method for identifying a polypeptide having a cellulase activity 
comprising the following steps: 

(a) providing a polypeptide as set forth in claim 48 • 

(b) providing a cellulase substrate; and 

(c) contacting the polypeptide with the substrate of step (b) and detecting a 
decrease in the amoimt of substrate or an increase in the amount of a reaction product, 
wherein a decrease in the amount of the substrate or an increase in the amount of the 
reaction product detects a polypeptide having a cellulase activity, 

94. A method for identifying a cellulase substrate comprismg the following 

steps: 

(a) providmg a polypeptide as set forth in claim 48; 

(b) providing a test substrate; and 

(c) contacting the polypeptide of step (a) with the test substrate of step (b) and 
detecting a decrease in the amomit of substrate or an mcrease in the amount of reaction 
product, wherein a decrease in the amount of the substrate or an increase in the amoxmt of 
a reaction product identifies the test substrate as a cellulase substrate. 

95. A method of determining whether a test compound specifically binds to a 
polypeptide comprising the following steps: 

(a) expressing a nucleic acid or a vector comprising the nucleic acid xmder 
conditions permissive for translation of the nucleic acid to a polypeptide, wherein the 
nucleic acid has a sequence as set forth in claim 1; 

(b) providing a test compound; 

(c) contacting tlie polypeptide with tlie test compound; and 

(d) determining whether the test compound of step (b) specifically binds to the 
polypeptide, 

96. A metliod of determining whether a test compound specifically binds to a 
polypeptide comprising the following steps: 

(a) providing a polypeptide as set forth in clarni 48; 

(b) providing a test compound; 

(c) contacting the polypeptide with the test compound; and 

(d) determining whether the test compound of step (b) specifically binds to the 
polypeptide. 

97. A method for identifying a modulator of a cellulase activity comprising the 
following steps: 



wo 2006/101584 



212 



PCT/US2006/002516 



5644620 1 4240/D2 1 50-2WO 

(a) providing a polypeptide as set forth in claim 48; 

(b) providing a test compound; 

(c) contacting the polypeptide of step (a) with the test compound of step 
(b) and measuring an activity of the glucanase, wherein a change in the cellulase activity 

5 measured in the presence of the test compoimd compared to the activity in the absence of 
the test compoxmd provides a determination that the test compound modulates the 
cellulase activity, 

98. The method of claim 97, wherein the cellulase activity is measured by 
providing a cellulase substrate and detecting a decrease in the amoimt of the substrate or 
10 an increase in the amoimt of a reaction product, or, an increase in the amoimt of the 
substrate or a decrease in the amount of a reaction product, 

wherein optionally a decrease in the amount of the substrate or an increase in the 
amount of the reaction product with the test compound as compared to the amount of 
substrate or reaction product without the test compound identifies the test compoxmd as 
15 an activator of a cellulase activity, 

and optionally an increase in the amount of the substrate or a decrease in the 
amoimt of the reaction product with the test compound as compared to the amount of 
substrate or reaction product without the test compoimd identifies the test compound as 
an inhibitor of a cellulase activity. 
20 99. A computer system comprising a processor and a data storage device 

wherein said data storage device has stored thereon a polypeptide sequence or a nucleic 
acid sequence, wherein the polypeptide sequence comprises sequence as set forth in claim 
48, a polypeptide encoded by a nucleic acid as set forth in claim 1, 

wherein optionally the method further comprises a sequence comparison 
25 algorithm and a data storage device having at least one reference sequence stored thereon, 
or further comprises an identifier that identifies one or more features in said sequence 

and optionally the sequence comparison algorithm comprises a computer program 
that indicates polymorphisms. 

100. A computer readable medium having stored thereon a polypeptide 
30 sequence or a nucleic acid sequence, wherein the polypeptide sequence comprises a 

polypeptide as set forth in claim 48, or a polypeptide encoded by a nucleic acid as set 
forth in claim 1. 

101. A method for identifying a feature in a sequence comprising the steps of: 
(a) reading the sequence using a computer program which identifies one or more features 
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in a sequence, wherein the sequence comprises a polypeptide sequence or a nucleic acid 
sequence, wherein the polypeptide sequence comprises a polypeptide as set forth in claim 
48; a polypeptide encoded by a nucleic acid as set forth in claim 1; and (b) identifying 
one or more features in the sequence with the computer program. 

102. A method for comparing a first sequence to a second sequence comprising 
the steps of: (a) readmg the first sequence and the second sequence through use of a 
computer program which compares sequences, wherein the first sequence comprises a 
polypeptide sequence or a nucleic acid sequence, wherein the polypeptide sequence 
comprises a polypeptide as set forth in claim 48 or a polypeptide encoded by a nucleic 
acid as set forth in claim 1 ; and (b) determming differences between the first sequence 
and the second sequence with the computer program. 

wherein optionally the method further comprises a step of determining differences 
between the first sequence and the second sequence, or optionally the method further 
comprises the step of identifying polymorphisms, or optionally the method fiirther 
comprises use of an identifier that identifies one or more features in a sequence, 

and optionally the method comprises reading the first sequence using a computer 
program and identifying one or more features in the sequence. 

103. A method for isolating or recovering a nucleic acid encoding a polypeptide 
with a cellulase activity from an environmental sample comprising the steps of: 

(a) providing an amplification primer pair as set forth in claim 32; 

(b) isolating a nucleic acid fi-om the environmental sample or treating the 
environmental sample such that nucleic acid in the sample is accessible for hybridization 
to the amplification primer pair; and, 

(c) combining the nucleic acid of step (b) with the amplification primer pair of 
step (a) and amplifying nucleic acid fi"om the environmental sample, thereby isolatmg or 
recovering a nucleic acid encoding a polypeptide with a cellulase activity fi-om an 
environmental sample. 

104. A method for isolating or recovering a nucleic acid encoding a polypeptide 
with a cellulase activity from an envuDnmental sample comprising the steps of: 

(a) providing a polynucleotide probe comprising a sequence as set forth in claim 
1, or a subsequence thereof, or a probe as set forth in claim 31; 

(b) isolating a nucleic acid from the enviroimiental sample or treating the 
environmental sample such that nucleic acid in the sample is accessible for hybridization 
to a polynucleotide probe of step (a); 
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(c) combining the isolated nucleic acid or the treated environmental sample of step 
(b) with the polynucleotide probe of step (a); and 

(d) isolating a nucleic acid that specifically hybridizes with the polynucleotide 
probe of step (a), thereby isolating or recovering a nucleic acid encoding a polypeptide 

5 with a cellulase activity j&om an environmental sample. 

105. The method of claim 103 or claim 104, wherein the environmental sample 
comprises a water sample, a liquid sample, a soil sample, an air sample or a biological 
sample, and optionally the biological sample is derived from a bacterial cell, a protozoan 
cell, an insect cell, a yeast cell, a plant cell, a fungal cell or a mammalian cell. 
10 1 06. A method of generating a variant of a nucleic acid encoding a polypeptide 

with a cellulase activity comprising the steps of: 

(a) providing a template nucleic acid comprising a sequence as set forth in claim 

1; and 

(b) modifyuig, deleting or adding one or more nucleotides in the template 

15 sequence, or a combination thereof, to generate a variant of the template nucleic acid 
wherein optionally the method further comprises expressing the variant nucleic 
acid to generate a variant cellulase polypeptide, 

and optionally the modifications, additions or deletions are introduced by a 
method comprising error-prone PGR, shuffling, oligonucleotide-directed mutagenesis, 

20 assembly PGR, sexual PGR mutagenesis, in vivo mutagenesis, cassette mutagenesis, 
recursive ensemble mutagenesis, exponential ensemble mutagenesis, site-specific 
mutagenesis, gene reassembly. Gene Site Saturation Mutagenesis (GSSM), synthetic 
ligation reassembly (SLR), recombination, recursive sequence recombination, 
phosphothioate-modified DNA mutagenesis, uracil-containing template mutagenesis, 

25 gapped duplex mutagenesis, point mismatch repair mutagenesis, repair-deficient host 

strain mutagenesis, chemical mutagenesis, radiogenic mutagenesis, deletion mutagenesis, 
restriction-selection mutagenesis, restriction-purification mutagenesis, artificial gene 
synthesis, ensemble mutagenesis, chimeric nucleic acid multimer creation and a 
combination thereof 

30 and optionally the method is iteratively repeated imtil a cellulase having an altered 

or different activity or an altered or different stability from that of a polypeptide encoded 
by the template nucleic acid is produced. 

107. The method of claim 106, wherein the variant cellulase polypeptide: (a) is 
thermotolerant, and retains some activity after being exposed to an elevated temperature; 
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(b) has increased glycosylation as compared to the cellulase-encoded by a template 
nucleic acid; or, (c) has a cellulase activity under a high temperature, wherein the 
cellulase-encoded by the template nucleic acid is not active imder the high temperature. 

108. The method of claim 106, wherein the method is iteratively repeated imtil 
5 (a) a cellulase coding sequence havmg an altered codon usage from that of the template 

nucleic acid is produced, or, (b) a cellidase gene having higher or lower level of message 
expression or stability from that of the template nucleic acid is produced. 

109. A method for modifying codons in a nucleic acid encoding a polypeptide 
with a cellulase activity to increase its expression in a host cell, the method comprising 

10 the following steps: 

(a) providing a nucleic acid encoding a polypeptide with a cellulase activity 
comprising a sequence as set forth in claim 1; and, 

(b) identifying a non-preferred or a less preferred codon in the nucleic acid of step 
(a) and replacing it with a preferred or neutrally used codon encoding the same amino 

15 acid as the replaced codon, wherein a preferred codon is a codon over-represented in 

coding sequences in genes in the host cell and a non- preferred or less preferred codon is 
a codon imder-represented in coding sequences in genes in the host cell, thereby 
modifying the nucleic acid to increase its expression in a host cell. 

110. A method for modifying codons in a nucleic acid encoding a cellulase 
20 polypeptide, the method comprising the following steps: 

(a) providing a nucleic acid encoding a polypeptide with a cellulase activity 
comprising a sequence as set forth in claim 1; and, 

(b) identifying a codon in the nucleic acid of step (a) and replacing it with a 
different codon encoding the same amino acid as the replaced codon, thereby modifying 

25 codons in a nucleic acid encoding a cellulase. 

111. A method for modifying codons in a nucleic acid encoding a cellulase 
polypeptide to increase its expression in a host cell, the method comprising the following 
steps: 

(a) providing a nucleic acid encoding a cellulase polypeptide comprising a 
30 sequence as set forth in claun 1 ; and, 

(b) identifying a non-preferred or a less preferred codon in the nucleic acid of step 
(a) and replacing it with a preferred or neutrally used codon encoding the same amino 
acid as the replaced codon, wherein a preferred codon is a codon over-represented in 
coding sequences in genes in the host cell and a non- preferred or less preferred codon is 
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a codon under-represented in coding sequences in genes in the host cell, thereby 
modifying the nucleic acid to increase its expression in a host cell, 

112. A method for modifying a codon in a nucleic acid encoding a polypeptide 
having a cellulase activity to decrease its expression in a host cell, the method comprising 

5 the following steps: 

(a) providing a nucleic acid encoding a cellulase polypeptide comprising a 
sequence as set forth in claim 1; and 

(b) identifying at least one preferred codon in the nucleic acid of step (a) and 
replacing it with a non- preferred or less preferred codon encoding the same amino acid 

10 as the replaced codon, wherein a preferred codon is a codon over-represented in coding 
sequences in genes in a host cell and a non- preferred or less preferred codon is a codon 
imder-represented in coding sequences in genes in the host cell, thereby modifying the 
nucleic acid to decrease its expression in a host cell, 

wherein optionally the host cell is a bacterial cell, a fungal cell, an insect cell, a 

15 yeast cell, a plant cell or a mammalian cell. 

113. A method for producing a Ubrary of nucleic acids encoding a plurality of 
modified cellulase active sites or substrate binding sites, wherein the modified active sites 
or substrate binding sites are derived from a first nucleic acid comprising a sequence 
encoding a first active site or a first substrate binding site the method comprising the 

20 following steps: 

(a) providing a first nucleic acid encoding a first active site or first substrate 
binding site, wherein the first nucleic acid sequence comprises a sequence that hybridizes 
under stringent conditions to a sequence as set forth in SEQ ID N0:1, SEQ ID N0:3, 
SEQ ID N0:5, SEQ ID N0:7, SEQ ID N0:9, SEQ ID NO: 11, SEQ ID NO: 13, SEQ ID 

25 N0:15, SEQ ID NO:17, SEQ ID N0:19, SEQ ID NO:21, SEQ ID NO:23, SEQ ID 
NO:25, SEQ ID NO:27, SEQ ID NO:29, SEQ ID NO:31, SEQ ID NO:33, SEQ ID 
NO:35, SEQ ID NO:37, SEQ ID NO:39, SEQ ID NO:41, SEQ ID NO:43, SEQ ID 
NO:45, SEQ ID NO:47, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:53, SEQ ID 
NO:55, SEQ ID NO:57, SEQ ID NO:59, SEQ ID N0:61, SEQ ID NO:63, SEQ ID 

30 NO:65, SEQ ID NO:67, SEQ ID NO:69, SEQ ID N0:71, SEQ ID NO:73, SEQ ID 
NO:75, SEQ ID NO:77, SEQ ID NO:79, SEQ ID NO:81, SEQ ID NO:83, SEQ ID 
NO:85, SEQ ID NO:87, SEQ ID NO:89, SEQ ID NO:91, SEQ ID NO:93, SEQ ID 
NO:95, SEQ ID NO:97, SEQ ID NO:99, SEQ ID NO: 101, SEQ ID NO: 103, SEQ ID 
NO:105, SEQ IDNO:107, SEQ IDNO:109, SEQIDNO:lll, SEQIDN0:113, SEQ ID 
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NO: 1 1 5, SEQ ID NO: 1 1 7, SEQ ID NO: 1 1 9, SEQ ID NO: 1 2 1 , SEQ ID NO: 123, SEQ ID 
NO: 125, SEQ ID NO: 127, SEQ ID NO: 129, SEQ ID NO: 131, SEQ ID NO: 133, SEQ ID 
NO:135, SEQIDNO:137, SEQ ID NO:139, SEQ IDN0:141, SEQIDNO:143, SEQ ID 
NO:l45, SEQ ID NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:153, SEQ ID 
NO:155, SEQ ID NO:157, SEQ ID NO:159, SEQ ID N0:161, SEQ ID NO: 163 or SEQ 
ID NO: 165, or a subsequence thereof, and the nucleic acid encodes a cellulase active site 
or a cellulase substrate binding site; 

(b) providing a set of mutagenic oligonucleotides that encode naturally-occtming 
amino acid variants at a plurality of targeted codons in the first nucleic acid; and, 

(c) using the set of mutagenic oligonucleotides to generate a set of active site- 
encoding or substrate binding site-encoding variant nucleic acids encoding a range of 
amino acid variations at each amino acid codon (hat was mutagenized, thereby producmg 
a library of nucleic acids encoding a plurality of modified cellulase active sites or 
substrate binding sites. 

wherein optionally a mutagenic oligonucleotide or a variant nucleic acid is 
generated by a method comprising an optimized directed evolution system. Gene Site- 
Saturation Mutagenesis (GSSM), or a synthetic ligation reassembly (SLR), error-prone 
PGR, shuffling, oUgonucleotide-directed mutagenesis, assembly PGR, sexual PGR 
mutagenesis, in vivo mutagenesis, cassette mutagenesis, recursive ensemble mutagenesis, 
exponential ensemble mutagenesis, site-specific mutagenesis, gene reassembly, 
recombination, recmrsive sequence recombination, phosphothioate-modified DNA 
mutagenesis, uracil-containing template mutagenesis, gapped duplex mutagenesis, point 
mismatch repair mutagenesis, repair-deficient host strain mutagenesis, chemical 
mutagenesis, radiogenic mutagenesis, deletion mutagenesis, restriction-selection 
mutagenesis, restriction-purification mutagenesis, artificial gene synthesis, ensemble 
mutagenesis, chimeric nucleic acid muitimer creation and a combination thereof. 

1 14. A method for making a small molecule comprising the following steps: 

(a) providing a plurality of biosynthetic enzymes capable of synthesizing or 
modifying a small molecule, wherein one of the enzymes comprises a cellulase enzyme 
encoded by a nucleic acid comprising a sequence as set forth in claim 1; 

(b) providing a substrate for at least one of the enzymes of step (a); and 

(c) reacting the substrate of step (b) with the enzymes xmder conditions that 
faciUtate a plurality of biocatalytic reactions to generate a small molecule by a series of 
biocatalytic reactions. 
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115. A method for modifying a small molecule comprising the following steps: 
(a) providing a cellulase enzyme, wherein the enzyme comprises a polypeptide as 
set forth in claim 48, or a polypeptide encoded by a nucleic acid comprising a nucleic 
acid sequence as set forth in claim 1; 
5 (b) providing a small molecule; and 

(c) reacting the enzyme of step (a) with the small molecule of step (b) xmder 
conditions that facilitate an enzymatic reaction catalyzed by the cellulase enzyme, thereby 
modifying a small molecule by a cellulase enzymatic reaction. 

wherein optionally step (b) comprises providing a plurality of small molecule 
10 substrates for the enzyme of step (a), thereby generating a library of modified small 

molecules produced by at least one enzymatic reaction catalyzed by the cellulase enzyme; 

and optionally the method further comprises providing a plurality of additional 
enzymes under conditions that facilitate a plurality of biocatalytic reactions by the 
enzymes to form a Ubrary of modified small molecules produced by the pluraUty of 
16 enzymatic reactions; 

and optionally the method further comprises the step of testing the library to 
determine if a particular modified smaU molecule which exhibits a desired activity is 
present within the library, wherein optionally the step of testing the library further 
comprises the steps of systematically eliminating all but one of the biocatalytic reactions 
20 . used to produce a portion of the plurality of the modified small molecules within tlie 

library by testing the portion of the modified small molecule for the presence or absence 
of the particular modified small molecule with a desired activity, and identifying at least 
one specific biocatalytic reaction that produces the particular modified small molecule of 
desired activity. 

25 116. A method for determining a functional fi:agment of a cellulase enzyme 

comprising the steps of: 

(a) providing a cellulase enzyme, wherein the enzyme comprises a polypeptide as 
set forth in claim 48, or a polypeptide encoded by a nucleic acid as set forth in claim 1 ; 
and 

30 (b) deleting a pluraUty of amino acid residues firom the sequence of step (a) and 

testing the remaimng subsequence for a cellulase activity, thereby determining a 
functional fi:agment of a cellulase enzyme. 
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wherein optionally the cellulase activity is measured by providing a cellulase 
substrate and detecting a decrease in the amount of the substrate or an increase in the 
amount of a reaction product. 

1 17. A method for whole cell engineering of new or modified phenotypes by 
5 using real-time metabolic flux analysis, the method comprising the foUowmg steps: 

(a) maldng a modified cell by modifying the genetic composition of a cell, 
wherein the genetic composition is modified by addition to the cell of a nucleic acid 
comprising a sequence as set forth in claim 1 ; 

(b) culturing the modified cell to generate a plurality of modified cells; 

10 (c) measuring at least one metabolic parameter of the cell by monitoring the cell 

culture of step (b) in real time; and, 

(d) analyzing the data of step (c) to determine if the measm-ed parameter differs 

from a comparable measurement in an uimiodified cell xmder similar conditions, thereby 

identifying an engineered phenotype in the cell using real-time metabolic flux analysis. 
16 whereia optionally the genetic composition of the cell is modified by a method 

comprisiug deletion of a sequence or modification of a sequence in the cell, or, knocking 

out the expression of a gene, 

and optionally the method further comprises selectmg a cell comprising a newly 

engineered phenotype, 

20 and optionally the method further comprises culturing the selected cell, thereby 

generating a new cell strain comprising a newly engineered phenotype. 

118. An isolated or recombinant signal or leader sequence consisting of an 
amino acid sequence as set forth in the amino terminal residues 1 to 14, 1 to 15, 1 to 16, 1 
to 17, 1 to 18, 1 to 19, 1 to 20, 1 to 21, 1 to 22, 1 to 23, 1 to 24, 1 to 25, 1 to 26, 1 to 27, 1 

25 to 28, 1 to 28, 1 to 30, 1 to 31, 1 to 32, 1 to 33, 1 to 34, 1 to 35, 1 to 36, 1 to 37, 1 to 38. 1 
to 40, 1 to 41, 1 to 42, 1 to 43 or 1 to 44, of (a) an amino acid sequence as set forth in 
claim 48; or, (b) an amino acid sequence as set forth in SEQ ID N0:2, SEQ ID N0:4, 
SEQ ID NO:6, SEQ ID N0:8, SEQ ID NO: 10, SEQ ID NO: 12, SEQ ID NO: 14, SEQ ID 
NO: 16, SEQ ID NO: 18, SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID 

30 NO:26, SEQ ID NO:28, SEQ ID NO:30, SEQ ID NO:32, SEQ ID NO:34, SEQ ID 
NO:36, SEQ ID NO:38, SEQ ID NO:40, SEQ JD NO:42, SEQ ID NO:44, SEQ ID 
NO:46, SEQ ID NO:48, SEQ JD NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID 
NO:56, SEQ ID NO:58, SEQ ID NO:60, SEQ ID NO:62, SEQ ID NO:64, SEQ ID 
NO:66, SEQ ID NO:68, SEQ ID NO:70, SEQ ID NO:72, SEQ ID NO:74, SEQ ID 
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NO:76, SEQ ID NO:78, SEQ ID NO:80, SEQ ID NO:82, SEQ ID NO:84, SEQ ID 
NO:86, SEQ ID NO:88, SEQ ID NO:90, SEQ ID NO:92, SEQ ID NO:94, SEQ ID 
NO:96, SEQ ID NO:98, SEQ ID NO: 100, SEQ ID NO: 102, SEQ ID NO: 104, SEQ ID 
NO: 106, SEQ ID NO: 108, SEQ ID NO: 1 10, SEQ ID NO: 1 12, SEQ ID NO: 1 14, SEQ ID 
NO: 1 16, SEQ ID NO: 1 1 8, SEQ ID NO: 120, SEQ ID NO: 122, SEQ ID NO: 124, SEQ ID 
NO:126, SEQ ID NO:128, SEQ ID NO:130, SEQ ID NO:132, SEQ ID NO:134, SEQ ID 
NO:136, SEQ ID NO:138, SEQ ID NO:140, SEQ ID NO:142, SEQ ID NO:143, SEQ ID 
NO:146, SEQ ID NO:148, SEQ ID NO:150, SEQ ID NO:152, SEQ ID NO:154, SEQ ID 
NO:156, SEQ IDNO:158, SEQ ID NO:160, SEQ ID NO:162, SEQ ID NO:164 or SEQ 
ID NO: 166. 

119. A chimeric polypeptide comprising at least a first domain comprising a 
signal peptide (SP) or leader sequence having an amino acid sequence as set forth in 
claim 118, and at least a second domain comprising a heterologous polypeptide or 
peptide, wherein the heterologous polypeptide or peptide is not naturally associated with 
the signal peptide (SP) or leader sequence, 

and optionally the heterologous polypeptide or peptide is not a cellulase, and 
optionally the heterologous polypeptide or peptide is amino terminal to, carboxy terminal 
to or on both ends of tlie signal peptide (SP) or leader sequence. 

120. An isolated or recombinant nucleic acid encoding a chimeric polypeptide, 
wherein the chimeric polypeptide comprises at least a first domain comprising signal 
peptide (SP) or leader sequence having an amino acid sequence as set forth in claim 118 
and at least a second domain comprising a heterologous polypeptide or peptide, wherein 
the heterologous polypeptide or peptide is not naturally associated with the signal peptide 
(SP) or leader sequence. 

121 . An isolated or recombinant nucleic acid comprising a sequence encoding a 
polypeptide having a cellulase activity and a signal sequence, wherein the nucleic acid 
comprises a sequence as set forth in claim 1. 

122. The isolated or recombinant nucleic acid of claim 121, wherem the signal 
sequence is derived from another cellulase or a non-cellulase enzyme. 

123. An isolated or recombinant nucleic acid comprising a sequence encoding a 
polypeptide having a cellulase activity, wherein the sequence does not contain a signal 
sequence and the nucleic acid comprises a sequence as set forth in claim 1. 

124. A method of increasing thermotolerance or thermostability of a cellulase 
{polypeptide, the method comprising glycosylating a cellulase, wherein the polypeptide 
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comprises at least thirty contiguous amino acids of a pol5q)eptide as set forth in claim 48, 
or a polypeptide encoded by a nucleic acid as set forth in claim 1, thereby increasing the 
thermotolerance or thennostabiUty of the cellulase. 

125. A method for overexpressing a recombinant cellulase in a cell comprising 
5 expressmg a vector comprising a nucleic acid sequence as set forth in claim 1, wherein 

overexpression is effected by use of a high activity promoter, a dicistronic vector or by 
gene amplification of the vector. 

126. A method of makmg a transgenic plant comprising the following steps: 

(a) mtroducing a heterologous nucleic acid sequence into the cell, wherein the 
10 heterologous nucleic sequence comprises a sequence as set forth in claim 1, thereby 

producing a transformed plant cell; 

(b) producing a transgenic plant from the transformed cell. 

wherein optionally the step (a) further comprises introducing the heterologous 
nucleic acid sequence by electroporation or microinjection of plant cell protoplasts, 
15 and optionally step (a) comprises introducing the heterologous nucleic acid 

sequence directly to plant tissue by DNA particle bombardment or by using an 
Agrobacterium tumefaciens host. 

127. A method of expressing a heterologous nucleic acid sequence in a plant 
cell comprising the following steps: 

20 (a) transforming the plant cell with a heterologous nucleic acid sequence operably 

linked to a promoter, wherein the heterologous nucleic sequence comprises a sequence as 
set forth in claim 1; 

(b) growing the plant under conditions wherein the heterologous nucleic acids 
sequence is expressed in the plant cell. 
25 128. A method for hydrolyzing, breaking up or disrupting a glucan- or 

cellulose- comprising composition comprismg the following steps: 

(a) providing a polypeptide having a cellulase activity as set forth in claim 48, or a 
polypeptide encoded by a nucleic acid as set forth in claim 1; 

(b) providing a composition comprising a cellulose or a glucan; and 

30 (c) contacting the polypeptide of step (a) with the composition of step (b) under 

conditions wherein the cellulase hydrolyzes, breaks up or disrupts the glucan- or 
cellulose- comprising composition. 

wherein optionally the composition comprises a plant cell, a bacterial cell, a yeast 
cell, an insect cell, or an animal ceU, 
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and optionally the polypeptide has endoglucanase, cellobiohydrolase, mannanase 
and/or beta-glucosidase activity. 

129. A dough or a bread product comprising a polypeptide as set forth in claim 
48, or a polypeptide encoded by a nucleic acid as set forth in claim 1, wherein optionally 

5 the polypeptide has cellulase, endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase activity. 

130. A method of dough conditioning comprising contacting a dough or a bread 
product with at least one polypeptide as set forth in claim 48, or a polypeptide encoded by 
a nucleic acid as set forth in claim 1, under conditions sufficient for conditioning the 

10 dough. 

131. A beverage comprising a polypeptide as set forth in claim 48, or a 
polypeptide encoded by a nucleic acid as set forth in claim 1, wherein optionally the 
polypeptide has endoglucanase, cellobiohydrolase, mamanase and/or beta-glucosidase 
activity. 

15 1 32. A method of beverage production comprising administration of at least one 

polypeptide as set forth in claim 48, or a polypeptide encoded by a nucleic acid as set 
forth in claim 1, to a beverage or a beverage precursor under conditions sufficient for 
decreasing the viscosity of the beverage, 

wherein optionally the beverage or beverage precursor is a wort or a beer. 
20 133. A food, a feed or a nutritional supplement comprising a polypeptide as set 

forth in claim 48, or a polypeptide encoded by a nucleic acid as set fortli in claim 1, 
wherein optionally the polypeptide has cellidase, endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase activity. 

134. A method for utilizing a cellulase as a nutritional supplement in an animal 
25 diet, the method comprising: 

preparing a nutritional supplement contaming a cellulase enzyme comprising at 
least thirty contiguous anaino acids of a polypeptide as set forth in claim 48, or a 
polypeptide encoded by a nucleic acid as set forth in claim 1 ; and 

administering the nutritional supplement to an animal to increase utilization of a 
30 xylan contained in a feed or a food ingested by the animal. 

wherein optionally the animal is a human, or the animal is a ruminant or a 
monogastric animal, 

and optionally the cellulase enzyme is prepared by expression of a polynucleotide 
encoding the cellulase in an organism selected from the group consisting of a bacteriimi, a 
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yeast, a plant, an insect, a fungus and an animal, and optionally the organism is selected 
from the group consisting of an S, pombe, S, cerevisiae, Pichia pastoris, E, coli, 
Streptomyces sp.. Bacillus sp. wid Lactobacillus sp, 

135. An edible enzyme delivery matrix or pellet comprising a thermostable 
5 recombinant cellulase enzyme comprising a polypeptide as set forth in claim 48, or a 

polypeptide encoded by a nucleic acid as set forth in claim 1, wherein optionally the 
polypeptide has cellulase, endoglucanase, cellobiohydroiase, maimanase and/or beta- 
glucosidase activity. 

136. A method for delivering a cellulase supplement to an animal, the method 
10 comprising: preparing an edible enzyme delivery matrix or pellets comprising a granulate 

edible carrier and a thermostable recombinant cellulase enz5ane, wherein the pellets 
readily disperse the cellulase enzyme contained therein into aqueous media, and the 
recombinant cellulase enzyme comprises a polypeptide as set forth in claim 48, or a 
polypeptide encoded by a nucleic acid as set fortli in claim 1; and, administering the 
15 edible enzyme delivery matrix or pellet to the animal. 

wherein optionally the granulate edible carrier comprises a carrier selected from 
the group consisting of a grain germ, a grain germ that is spent of oil, a hay, an alfalfa, a 
timothy, a soy hull, a sunflower seed meal and a wheat midd, 

and optionally the edible carrier comprises grain germ that is spent of oil, 
20 and optionally the cellulase enzyme is glycosylated to provide thermostability at 

pelletizing conditions, 

and optionally the delivery matrix is formed by pelletizing a mixture comprising a 
grain germ and a cellulase, 

and optionally the pelletizing conditions include application of steam, and 
25 optionally the pelletizing conditions comprise application of a temperature in excess of 
about SO'^C for about 5 minutes and the enzyme retains a specific activity of at least 350 
to about 900 units per milUgram of enzyme. 

137. A cellulose- or cellulose derivative- composition comprising a polypeptide 
as set forth in claim 48, or a polypeptide encoded by a nucleic acid as set forth in claim 1, 

30 wherein optionally the polypeptide has cellulase, endoglucanase, cellobiohydroiase, 
mannanase and/or beta-glucosidase activity. 

138. A wood, wood pulp or wood product comprising a cellulase as set forth in 
claim 48, or a cellulase encoded by a nucleic acid as set forth in claim 1, wherein 
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optionally the cellulase activity comprises endoglucanase, ceUobiohydrolase, mannanase 
and/or beta-glucosidase activity. 

139. A paper, paper pulp or paper product comprising a polypeptide as set forth 
in claim 48, or a polypeptide encoded by a nucleic acid as set forth in claim 1, wherein 

5 optionally the polypeptide has cellulase, endoglucanase, cellobiohydi'olase, mannanase 
and/or beta-glucosidase activity. 

140. A method for reducing the amount of cellulose in a paper, a wood or wood 
product comprising contacting the paper, wood or wood product with a cellulase as set 
forth in clahn 48, or a cellulase encoded by a nucleic acid as set forth in claim 1, wherein 

1 0 optionally the cellulase activity comprises endoglucanase, ceUobiohydrolase, mannanase 
and/or beta-glucosidase activity. 

141. A detergent composition comprising a cellulase as set forth in claim 48, or 
a cellulase encoded by a nucleic acid as set forth in claim 1, 

wherein optionally the polypeptide is formulated in a non-aqueous liquid 
15 composition, a cast soHd, a granular form, a particulate form, a compressed tablet, a gel 
forni, a paste or a slurry form, 

and optionally the cellulase activity comprises endoglucanase, ceUobiohydrolase, 
mannanase and/or beta-glucosidase activity. 

142. A pharmaceutical composition or dietary supplement comprising a 

20 cellulase as set forth in claim 48, or a cellulase encoded by a nucleic acid as set forth in 
claim 1, 

wherein optionaUy the cellulase is formulated as a tablet, gel, pUl, implant, liquid, 
spray, powder, food, feed pellet or as an encapsulated formulation 

and optionaUy the ceUulase activity comprises endoglucanase, ceUobiohydrolase, 
25 mannanase and/or beta-glucosidase activity. 

143. A fuel comprising a polypeptide as set forth in claim 48, or a polypeptide 
encoded by a nucleic acid as set forth in claim 1, wherein optionally the polypeptide has 
activity comprising ceUulase, endoglucanase, ceUobiohydrolase, mannanase and/or beta- 
glucosidase activity, 

30 wherein optionally the fuel is derived from a plant material, which optionally 

comprises potatoes, soybean (rapeseed), barley, lye, com, oats, wheat, beets or sugar 
cane, 

and optionaUy the fuel comprises a bioethanol or a gasoline-ethanol mix. 
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144, A method for making a fbel comprising contacting a composition 
comprising a cellulose or a fermentable sugar with a polypeptide as set fortli in claim 48, 
or a polypeptide encoded by a nucleic acid as set forth in claim 1, 

wherein optionally the composition comprising a cellulose or a fermentable sugar 
5 comprises a plant, plant product or plant derivative, and optionally the plant or plant 

product comprises cane sugar plants or plant products, beets or sugarbeets, wheat, com, 
soybeans, potato, rice or barley, 

and optionally the polypeptide has activity comprising cellulase, endoglucanase, 
cellobiohydrolase, maimanase and/or beta-glucosidase activity, 
10 and optionally the fuel comprises a bioethanol or a gasoline-ethanol mix. 

145. A method for making bioethanol comprismg contacting a composition 
comprising a cellulose or a fermentable sugar with a polypeptide as set forth in claim 48, 
or a polypeptide encoded by a nucleic acid as set forth in claim 1, 

wherein optionally the composition comprising a cellulose or a fermentable sugar 
15 comprises a plant, plant product or plant derivative, and optionally the plant or plant 

product comprises cane sugar plants or plant products, beets or sugarbeets, wheat, com, 

soybeans, potato, rice or barley, 

and optionally the polypeptide has activity comprising cellulase, endoglucanase, 

cellobiohydrolase, marmanase and/or beta-glucosidase activity. 
20 146. An enzyme ensemble for depolymerization of cellulosic and 

hemicellulosic polymers to metaboUzeable carbon moieties comprising a polypeptide as 

set forth in claim 48, or a polypeptide encoded by a nucleic acid as set forth in claim 1, 
wherein optionally the polypeptide has activity comprising cellulase, 

endoglucanase, cellobiohydrolase, mannanase and/or beta-glucosidase activity. 
25 147. A method for processing a biomass material comprisiag lignocellulose 

comprisiag contacting a composition comprising a cellulose or a fermentable sugar with a 

polypeptide as set forth in claim 48, or a polypeptide encoded by a nucleic acid as set 

forth in claim 1, 

wherein optionally the biomass material comprising lignocellulose is derived from 
30 an agricultural crop, is a byproduct of a food or a feed production, is a lignocellulosic 
waste product, or is a plant residue or a waste paper or waste paper product, and 
optionally the polypeptide has activity comprismg cellulase, endoglucanase, 
cellobiohydrolase, mannanase and/or beta-glucosidase activity, 
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and optionally the plant residue comprise stems, leaves, hulls, husks, cobs, wood, 
wood chips, wood pulp and sawdust, 

and optionally the paper waste comprises discarded or used photocopy paper, 
computer printer paper, notebook paper, notepad paper, typewriter paper, newspapers, 
5 magazines, cardboard and paper-based packaging materials, 

and optionally the processing of the biomass material generates a bioethanoL 

148. A dairy product comprising a polypeptide as set forth in claim 48, or a 
polypeptide encoded by a nucleic acid as set forth in claim 1, wherein optionally the dairy 
product comprises a milk, an ice cream, a cheese or a yogurt, and optionally the 

10 polypeptide has activity comprising cellulase, endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase activity. 

149. A method for improving texture and flavor of a dairy product comprising 
the following steps: (a) providing a polypeptide as set forth in claim 48, or a polypeptide 
encoded by a nucleic acid as set forth in claim 1 ; (b) providing a dairy product; and (c) 

15 contacting the polypeptide of step (a) and the dairy product of step (b) under conditions 
wherein the cellulase can improve the texture or flavor of the dairy product. 

150. A textile or fabric comprising a polypeptide as set forth in claim 48, or a 
polypeptide encoded by a nucleic acid as set forth in claim 1, wherein optionally the 
textile or fabric comprises a cellulose-containing fiber, and optionally the polypeptide has 

20 activity comprising cellulase, endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase activity, 

.151. A method for treating solid or liquid animal waste products comprising the 
following steps: 

(a) providing a polypeptide as set forth in claun 48, or a polypeptide encoded by a 
25 nucleic acid as set forth in claim 1 , wherein optionally the polypeptide has activity 

comprising cellulase, endoglucanase, cellobiohydrolase, mannanase and/or beta- 
glucosidase activity; 

(b) providing a solid or a liquid animal waste; and 

(c) contacting the polypeptide of step (a) and the solid or liquid waste of step (b) 
30 under conditions wherein the protease can treat the waste. 

1 52. A processed waste product comprising a polypeptide as set forth in claim 
48, or a polypeptide encoded by a nucleic acid as set forth in claim 1, wherein optionally 
the polypeptide has activity comprising cellulase, endoglucanase, cellobiohydrolase, 
mannanase and/or beta-glucosidase activity. 
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153. A disinfectant comprising a polypeptide having a cellulase activity, 
wherein the polypeptide comprises a sequence as set forth in claim 48, or a polypeptide 
encoded by a nucleic acid as set forth in claim 1, wherein optionally the polypeptide has 
activity comprising endoglucanase, cellobiohydrolase, mannanase and/or beta- 

6 glucosidase activity. 

1 54. A biodefense or bio-detoxifying agent comprising a polypeptide having a 
cellulase activity, wherein the polypeptide comprises a sequence as set forth in claim 48, 
or a polj^eptide encoded by a nucleic acid as set forth in claim 1, wherein optionally the 
polypeptide has activity comprising endoglucanase, ccUobiohydrolase, mannanase and/or 

1 0 beta-glucosidase activity. 

1 55. An isolated or recombinant nucleic acid having a sequence comprising at 
least one nucleotide base residue modification of SEQ ID NO: 163, wherein the 
modification comprises one or more of the following changes: 

a nucleotide at any one of positions 265 to 267 is modified to CGT, CGC, CGA, 
15 COG, AGAorAGG; 

a nucleotide at any one of positions 307 to 309 is modified to GGT, GGC, GGA 
or GGG; 

a nucleotide at any one of positions 328 to 330 is modified to GGT, GGC, GGA 
or GGG; 

20 a nucleotide at any one of positions 340 to 342 is modified to TTA, TTG, CTT, 

CTCCTAorCTG, 

a nucleotide at any one of positions 469 to 471 is modified to TCT, TCC, TCA, 
TCG, AGTorAGC; 

a nucleotide at any one of positions 1441 to 1443 is modified to TTT or TTC; 
25 a nucleotide at any one of positions 1648 to 1650 is modified to AAT or AAC; or 

a nucleotide at any one of positions 1768 to 1770 is modified to CGT, CGC, 
CGA, CGG, AGA or AGG. 

1 56. An isolated or recombinant polypeptide having a sequence comprisiag at 
least one amino acid residue modification of SEQ ID NO: 164, wherein the modification 

30 comprises one or more of the following changes: 

a methionine at amino acid position 89 is modified to arginine; 
a phenylalanine at amino acid position 103 is modified to glycine; 
a proline at amino acid position 1 10 is modified to glycine; 
a tyrosine at amino acid position 1 14 is modified to leucine; 
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an alanine at amino acid position 157 is modified to serine; 

a tryptophan at amino acid position 481 is modified to phenylalanine; 

a proline at amino acid position 550 is modified to asparagine; or 

a glycine at amino acid position 590 is modified to arginine. 

1 57. . An isolated or recombinant nucleic acid having a sequence comprising a 
nucleotide residue sequence modification of SEQ ID N0:1, SEQ ID N0:3, SEQ ID 
NO:5, SEQ ID N0:7, SEQIDN0:9, SEQIDNOrll, SEQIDN0:13, SEQIDN0:15, 
SEQ ID NO: 17, SEQ ID NO: 19, SEQ ID NO:21, SEQ ID NO:23, SEQ ID NO:25, SEQ 
ID NO:27, SEQ ID NO:29, SEQ ID NO:31, SEQ ID NO:33, SEQ ID NO:35, SEQ ID 
NO:37, SEQ ID NO:39, SEQ ID N0:41, SEQ ID NO:43, SEQ ID NO:45, SEQ ID 
NO:47, SEQ ID NO:49, SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:55, SEQ ID 
NO:57, SEQ ID NO:59, SEQ ID NO:61, SEQ ID NO:63, SEQ ID NO:65, SEQ ED 
NO:67, SEQ ID NO:69, SEQ ID NO:71, SEQ ID NO:73, SEQ ID NO:75, SEQ ID 
NO:77, SEQ ID NO:79, SEQ ID NO:81, SEQ ID NO:83, SEQ ID NO:85, SEQ ID 
NO:87, SEQ ID NO:89, SEQ ID N0:91, SEQ ID NO:93, SEQ ID NO:95, SEQ ID 
NO:97, SEQ IDNO:99, SEQIDNO:101, SEQIDNO:103, SEQ ID NO: 105, SEQ ID 
NO: 107, SEQ ID NO: 109, SEQ ID NO: 1 1 1, SEQ ID NO: 1 13, SEQ ID NO:l 15, SEQ ID 
NO:117, SEQ ID NO:119, SEQ ID NO:121, SEQ ID NO:123, SEQ ED NO:125, SEQ ID 
NO:127, SEQ ID NO:129, SEQ ID N0:131, SEQ ID NO: 133, SEQ ID NO:135, SEQ ID 
NO:137, SEQ ID NO:139, SEQ ID N0:141, SEQ ID NO:143, SEQ ID NO:145, SEQ ID 
NO:147, SEQ ID NO:149, SEQ ID NO:151, SEQ ID NO:153, SEQ ID NO:155, SEQ ED 
NO:157, SEQ ID NO:159, SEQ ID N0:161, SEQ ID NO:163 or SEQ ED NO:165, 
wherein the modification comprises one or more of the following changes: 

a nucleotide at the equivalent of any one of positions 265 to 267 of SEQ ID 
NO: 163 are changed to CGT, CGC, CGA, CGG, AGA or AGG; 

a nucleotide at the equivalent of any one of positions 307 to 309 of SEQ ED 
NO: 163 are changed to GGT, GGC, GGA or GGG; 

a nucleotide at the equivalent of any one of positions 328 to 330 of SEQ ID 
NO: 163 are changed to GGT, GGC, GGA or GGG; 

a nucleotide at the equivalent of any one of positions 340 to 342 of SEQ ID 
NO: 163 are changed to TTA, TTG, CTT, CTC, OTA or CTG; 

a nucleotide at the equivalent of any one of positions 469 to 471 of SEQ ID 
NO: 163 are changed to TCT, TCC, TCA, TCG, AGT or AGO; 

a nucleotide at the equivalent of positions 1441 to 1443 of SEQ ID NO: 163 are 
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changed toTTTorTTC; 

a nucleotide at the equivalent of any one of positions 1648 to 1650 of SEQ ID 
NO: 1 63 are changed to AAT or AAC; or 

a nucleotide at the equivalent of any one of positions 1768 to 1770 of SEQ ID 
NO:163 are changed to CGT, CGC, CGA, CGG, AGA or AGG. 

158. An isolated or recombinant nucleic acid having a sequence comprising a 
nucleotide residue sequence modification of a nucleic acid as set forth in claim 1, wherein 
the modification comprises one or more of the following changes: 

a nucleotide at the equivalent of any one of positions 265 to 267 of SEQ ED 
NO: 163 are changed to CGT, CGC, CGA, CGG, AGA or AGG; 

a nucleotide at the equivalent of any one of positions 307 to 309 of SEQ ID 
NO: 163 are changed to GGT, GGC, GGA or GGG; 

a nucleotide at the equivalent of any one of positions 328 to 330 of SEQ ID 
NO: 163 are changed to GGT, GGC, GGA or GGG; 

a nucleotide at the equivalent of any one of positions 340 to 342 of SEQ ID 
NO: 163 are changed to TTA, TTG, CTT, CTC, CTA or CTG; 

a nucleotide at the equivalent of any one of positions 469 to 471 of SEQ ID 
NO: 163 are changed to TCT, TCC, TCA, TCG, AGT or AGC; 

a nucleotide at the equivalent of positions 1441 to 1443 of SEQ ID NO: 163 are 
changed to TTT or TTC; 

a nucleotide at the equivalent of any one of positions 1648 to 1650 of SEQ ID 
NO: 163 are changed to AAT or AAC; or 

a nucleotide at the equivalent of any one of positions 1768 to 1770 of SEQ ID 
NO: 163 are changed to CGT, CGC, CGA, CGG, AGA or AGG. 

1 59. An isolated or recombinant polypeptide having a sequence comprising an 
amino acid residue modification of SEQ ID N0:2, SEQ ID N0:4, SEQ ID N0:6, SEQ ID 
N0:8, SEQ ID NO:10, SEQ ID N0:12, SEQ ID N0:14, SEQ ID NO:16, SEQ ID NO:18, 
SEQ ID NO:20, SEQ ID NO:22, SEQ ID NO:24, SEQ ID NO:26, SEQ ID NO:28, SEQ 
ID NO:30, SEQ ID NO:32, SEQ ID NO:34, SEQ ID NO:36, SEQ ID NO:38, SEQ ID 
NO:40, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:46, SEQ ID NO:48, SEQ ID 
NO:50, SEQ ID NO:52, SEQ ID NO:54, SEQ ID NO:56, SEQ ID NO:58, SEQ ID 
NO:60, SEQ ID NO:62, SEQ ID NO:64, SEQ ID NO:66, SEQ ID NO:68, SEQ ID 
NO:70, SEQ ID NO:72, SEQ ID NO:74, SEQ ID NO:76, SEQ ID NO:78, SEQ ID 
NO:80, SEQ ID NO:82, SEQ ID NO:84, SEQ ID NO:86, SEQ ID NO:88, SEQ ID 
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NO:90, SEQ ID NO:92, SEQ ID NO:94, SEQ ID NO:96, SEQ ID NO:98, SEQ ID 
NO:100, SEQ ID NO:102, SEQ ID NO:104, SEQ ID NO:106, SEQ ID NO:108, SEQ ID 
NO: 1 10, SEQ ID NO: 1 12, SEQ ID NO: 1 14, SEQ ID NO: 1 1 6, SEQ ID NO: 1 1 8, SEQ ID 
NO:120, SEQ ID NO:122, SEQ ID NO:124, SEQ ID NO:126, SEQ ID NO:128, SEQ ID 
NO:130, SEQ ID NO:132, SEQ ID NO:134, SEQ ID NO:136, SEQ ID NO:138, SEQ ID 
NO: 140, SEQ ID NO: 142, SEQ ID NO: 143, SEQ ID NO: 146, SEQ ID NO: 148, SEQ ID 
NO:150, SEQ ID NO:152, SEQ ID NO:154, SEQ IDNO:156, SEQ ID NO:158, SEQ ID 
NO:160, SEQ ID NO:162, SEQ ID NO:164 or SEQ ID NO:166, wherein the modification 
comprises one or more of the following changes; 

an amino acid at the equivalent of the methionine at amino acid position 89 of 
SEQ ID NO: 164 is changed to an arginine; 

an amino acid at the equivalent of the phenylalanine at amino acid position 103 of 
SEQ ID NO: 164 is changed to a glycine; 

an amino acid at the equivalent of the prolme at amino acid position 1 10 of SEQ 
ID NO; 164 is changed to a glycine; an amino acid at the equivalent of the tyrosine at 
ammo acid position 1 14 of SEQ ID NO: 164 is changed to a leucine; 

an ammo acid at the equivalent of the alanine at amino acid position 1 57 of SEQ 
ID NO; 164 is changed to a serine; 

an amino acid at the equivalent of the tryptophan at amino acid position 481 of 
SEQ ID NO; 164 is changed to a phenylalanine; 

an amino acid at the equivalent of the prolme at amino acid position 550 of SEQ 
ID NO: 164 is changed to an asparagine; or 

an amino acid at the equivalent of the glycine at ammo acid position 590 of SEQ 
ID NO: 164 is changed to an arginine. 

160. An isolated or recombinant polypeptide having a sequence comprising an 
ammo acid residue modification of a polypeptide as set forth in claim 48, wherein the 
modification comprises one or more of the following changes: 

an amino acid at the equivalent of the methionine at amino acid position 89 of 
SEQ ID NO: 164 is changed to an arginine; 

an ammo acid at tlie equivalent of the phenylalanine at amino acid position 103 of 
SEQ ID NO: 164 is changed to a glycine; 

an amino acid at the equivalent of the prolme at amino acid position 1 10 of SEQ 
ID NO: 164 is changed to a glycine; 
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an amino acid at the equivalent of the tyrosme at amino acid position 1 14 of SEQ 
ID NO: 164 is changed to a leucine; 

an amino acid at the equivalent of the alanine at amino acid position 157 of SEQ 
ID NO: 1 64 is changed to a serine; 

an amino acid at the equivalent of the tryptophan at amino acid position 481 of 
SEQ ID NO: 164 is changed to a phenylalanine; 

an amino acid at the equivalent of the proline at amino acid position 550 of SEQ 
ID NO: 164 is changed to an asparagine; or 

an amino acid at the equivalent of the glycine at amino acid position 590 of SEQ 
ID NO: 164 is changed to an argmine. 

161. The isolated or recombinant polypeptide of claim 48, wherein the 
polypeptide having a sequence as set forth in: 

(i) SEQ ID NO: 164, has alkaline endoglucanase/cellulase activity; 

(ii) SEQ ID NO: 110, has xylanase activity; 

(iii) SEQ ID NO: 12, has NAD binding oxidoreductase activity; 

(iv) SEQ ID NO:l 18, has short chain dehydrogenase activity; 

(v) SEQ ID NO: 14, has NADH dependent dehydrogenase activity; 

(vi) SEQ ID NO: 13 8, has peptidase activity; 

(vii) SEQ ID NO: 162, has alkaline endoglucanase activity, 

(viii) SEQ ID NO: 42, has cysteinyl tRNA synthetase activity; 

(viii) SEQ ID NO:32, has cellodextrin phosphorylase activity; 

(ix) SEQ ID NO: 50, has fdhd/narq oxidoreductase activity; 

(x) SEQ ID NO:54, has a radical S-adenosylmethionine (SAM) methyltransferase 
activity; or 

(xi) SEQ ID NO: 58, has a subtiUsin-like protease activity. 
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SEQUENCE LISTING 

<:110> DIVERSA CORPORATION 
BLUM, David 
GEMSCH, Joslin 
DYCAICO, Mark 

<120> CELLULASES, NUCLEIC ACIDS ENCODING THEM 
AND METHODS FOR MAKING AND USING THEM 

<130> 564462014240 

<140> Not Yet Assigned 
<141> Filed Herewith 

<150> US 60/662,224 
<151> 2005-03-15 

<160> 166 

<170> PatentIn version 3.1 

<210> 1 
<211> 1323 
<212> DNA 
<213> Unknown * 

<220> 

<223> Obtained from environmental sample 
<400> 1 



atgtcaacct 


ataaatttcc 


gcacaacttt 


ttttggggag 


ccgcaaccgc 


gtcttatcag 


60 


atcgaaggcg 


catggaacga 


ggatggcaaa 


ggcgaatcca 


tttgggatcg 


cttcagccat 


120 


acgcccggaa 


aggtcaccaa 


tgccgatacc 


ggtgacatcg 


cctgtgacca 


ctatcaccgt 


180 


tgggaggaag 


atatcgccct 


tatgcgccaa 


cttgggttga 


aggcgtaccg 


cttttccact 


240 


tcatggcccc 


gtgtgatccc 


ggcgggccgc 


agacgggtga 


atgtcaaagg 


gctggatttc 


300 


tacgatcgcc 


tggtggatgg 


tctgtgcgcc 


gcgaacatcg 


aaccgttcct 


caccctgtat 


360 


cactgggacc 


tgccgcaggc 


tcttcaagac 


gaaggcggct 


gggataatcg 


caacaccgcc 


420 


catgcctttg 


ccgattatgc 


cgcattgatg 


gtgaaacgac 


ttggcgaccg 


tatccgctat 


480 


tggacgacgt 


tcaacgaacc 


cagcgttgtg 


gcgttcaatg 


gtcattactc 


aggctcgcac 


540 


gccccgggca 


ttcaagatgc 


ccgtgttacc 


cgccaggtgg 


tgcatcattt 


gctggtggcg 


600 


catgggttgg 


ctgtgcaggc 


gatccgcggc 


gcaaactcca 


aagtggatgt 


gggcatcgtg 


660 


cttaatttat 


ggcccgccga 


acccgattcg 


gactcccccg 


aagatgccgc 


cgccgccgaa 


720 


gccgcctgga 


accggcacga 


gaccctgttc 


cttgacccca 


tctttaaggc 


gcattatccc 


780 


gtatctgccG 


ttgatgcgat 


tggggaggat 


atgccccgca 


tccacgacgg 


cgatctggcg 


840 


ttgatctctc 


aggaattgga 


ttttgtcggc 


atcaactatt 


actcccgcca 


tgtggtcagt 


. 900 
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gccacaaaag 


aaataggcag 


gcttcccgaa 


tcggaataca 


ctgaaatggg 


ctgggaagta 


960 


tgcgcccccg 


cactccgccg 


cctgctggtc 


aagatccata 


acgattaccg 


tttgccgccc 


1020 


atctatatca 


ccgaaaacgg 


atcggcattc 


aaggacgaag 


ttaacgcaga 


cggaaaggtt 


1080 


catgacccgc 


ggcggttgga 


ttacctgaaa 


caacacctga 


ttcaactttg 


ccttgccatg 


1140 


caggacggcg 


tggatgtgcg 


cggctacatg 


gcttggtccc 


tgctggataa 


tttcgagtgg 


1200 


ggtcacggct 


tttccaagcg 


ctttggcttg 


gtccatgtgg 


attacgagag 


ccagaagcgg 


1260 


attattaaag 


actcgggtga 


atggtatgca 


agtgtgatac 


ggaagaacga 


ggttgttgaa 


1320 


taa 












1323 



<210> 2 
<211> 440 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (2) . . . (438) 

<223> Glycosyl hydrolase family 1 
<220> 

<221> SITE 

<222> (10) . . . (24) 

<223> Glycosyl hydrolases family 1 N-terminal signature. Prosite id = 
PS00653 

<220> 

<221> SITE 

<222> (351) . . . (354) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 2 

Met Ser Thr Tyr Lys Phe Pro His Asn Phe Phe Trp Gly Ala Ala Thr 
15 10 15 



Ala Ser Tyr Gin lie Glu Gly Ala Trp Asn Glu Asp Gly Lys Gly Glu 
20 25 30 



Ser lie Trp Asp Arg Phe Ser His Thr Pro Gly Lys Val Thr Asn Ala 
35 40 45 



Asp Thr Gly Asp He Ala Cys Asp His Tyr His Arg Trp Glu Glu Asp 
50 55 60 



He Ala Leu Met Arg Gin Leu Gly Leu Lys Ala Tyr Arg Phe Ser Thr 
65 70 75 80 
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Ser Trp Pro Arg Val lie Pro Ala Gly Arg Arg Arg Val Asn Val Lys 
85 90 95 



Gly Leu Asp Phe Tyr Asp Arg Leu Val Asp Gly Leu Cys Ala Ala Asn 
100 105 110 



lie Glu Pro Phe Leu Thr Leu Tyr His Trp Asp Leu Pro Gin Ala Leu 
115 120 125 



Gin Asp Glu Gly Gly Trp Asp Asn Arg Asn Thr Ala His Ala Phe Ala 
130 135 140 



Asp Tyr Ala Ala Leu Met Val Lys Arg Leu Gly Asp Arg lie Arg Tyr 
145 150 155 160 



Trp Thr Thr Phe Asn Glu Pro Ser Val Val Ala Phe Asn Gly His Tyr 
165 170 175 



Ser Gly Ser His Ala Pro Gly lie Gin Asp Ala Arg Val Thr Arg Gin 
180 185 190 



Val Val His His Leu Leu Val Ala His Gly Leu Ala Val Gin Ala lie 
195 200 205 



Arg Gly Ala Asn Ser Lys val Asp Val Gly lie Val Leu Asn Leu Trp 
210 215 220 



Pro Ala Glu Pro Asp Ser Asp Ser Pro Glu Asp Ala Ala Ala Ala Glu 
225 230 235 240 



Ala Ala Trp Asn Arg His Glu Thr Leu Phe Leu Asp Pro lie Phe Lys 
245 250 255 



Ala His Tyr Pro Val Ser Ala Leu Asp Ala lie Gly Glu Asp Met Pro 
260 265 270 



Arg lie His Asp Gly Asp Leu Ala Leu lie Ser Gin Glu Leu Asp Phe 
275 280 285 



Val Gly He Asn Tyr Tyr Ser Arg His Val Val Ser Ala Thr Lys Glu 
290 295 300 



He Gly Arg Leu Pro Glu Ser Glu Tyr Thr Glu Met Gly Trp Glu Val 
305 310 315 320 



Cys Ala Pro Ala Leu Arg Arg Leu Leu Val Lys He His Asn Asp Tyr 
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325 330 335 



Arg Leu Pro Pro He Tyr He Thr Glu Asn Gly Ser Ala Phe Lys Asp 
340 345 350 



Glu Val Asn Ala Asp Gly Lys Val His. Asp Pro Arg Arg Leu Asp Tyr 
355 360 365 



Leu Lys Gin His Leu He Gin Leu Cys Leu Ala Met Gin Asp Gly Val 
370 375 380 



Asp Val Arg Gly Tyr Met Ala Trp Ser Leu Leu Asp Asn Phe Glu Tirp 
385 390 395 400 



Gly His Gly Phe Ser Lys Arg Phe Gly Leu Val His Val Asp Tyr Glu 
405 410 415 



Ser Gin Lys Arg He He Lys Asp Ser Gly Glu Trp Tyr Ala Ser Val 
420 425 430 



He Arg Lys Asn Glu Val Val Glu 
435 440 



<210> 3 
<211> 1389 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 3 



atgagcgctc 


cgagtcccgc 


ccgccccgtg 


tcctttcctc cccgcttcgt gtggggagcc 


60 


gcggccgcat 


cctatcaaat 


cgagggcgcc 


gtccgggagg acggcaaggg cccttcggtg 


120 


tgggacatgt 


tctgcgagaa 


gccgggagcc 


gtcttcgagg ggcacgacgg ggcggtggct 


180 


tgcgatcact 


accaccgtta 


ccgggaagac 


gtggccctga tgcggcagat tgggctccag 


240 


gcttaccgcc 


tgagcgtgtg 


ctggcccagg 


gtgctgcccg aggggaccgg gcagcccaac 


300 


gagaaggggc 


tcgacttcta 


ctcccggctc 


gtcgacgcct tgctcgaggc ggggatcacg 


360 


ccttgggtca 


ccctttttca 


ctgggactac 


ccactagccc tatatcaccg gggaggctgg 


420 


ctcaatcggg 


atagctcaga 


ctggttcggc 


gagtacgcgg gtctgattgc ggagcgcctc 


480 


tccgatcggg 


tgagccactt 


cttcacccag 


aacgagccGc aggtgtacat cggcttcggg 


540 


cacctcgagg 


ggaaacacgc 


gccgggcgat 


acccttcccc tgtcgcagat gctgctggcc 


600 


ggtcaccaca 


gcctgctcgc 


ccatggaaag 


gccgtgcagg cgctgcgcgc ccacggcaag 


660 
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cagcagctgc 


gggttggata 


cgctccggtg 


gggatgccgc 


tgcatccggt 


cagcgagtcc 


720 


gccgaagacg 


tggcggctgc 


acgcaccgcc 


actttccgcg 


tccgagagaa 


gaattcctgg 


780 


aacaacgctt 


ggtggatgga 


cccggtgtac 


ctcggtgagt 


accccgccca 


agggctcgag 


840 


ttctacgggc 


gagacgtccc 


cgcgatccgg 


tccggagaca 


tggaactcat 


ccggcaaccc 


900 


ttggactttt 


tcggcgtcaa 


catctaccag 


agcacgcccg 


tgcgcgccgc 


gggggcgccc 


960 


caggggttcg 


aggtcgtccg 


gcatccgacg 


ggccacccca 


tcaccgcgtt 


caactggccg 


1020 


gttacgccac 


aggccttgta 


ttgggggccg 


cggttcttct 


acgagcgcta 


tggcaagccc 


1080 


atcgtcatta 


cggaaaacgg 


gctttcctgc 


cgagacgtga 


tcgcccttga 


cggcaaggtg 


1140 


cacgatccgt 


cccgcatcga 


cttcaccacg 


cgctacctgc 


gcgagctcca 


ccgcgccatc 


1200 


gccgaaggca 


acgaggtgga 


gggctacttc 


cactggtcca 


tcatggacaa 


cttcgaatgg 


1260 


gctgccggat 


accgagaacg 


cttcgggctc 


gttcacgtgg 


attacgagac 


cctggtgagg 


1320 


acacccaagg 


actctgcggc 


gtggtaccgc 


caggtcatcc 


agagcaacgg ggccgtgctg 


1380 



ttcgattga 1389 



<210> 4 

<211> 462 

<212> PRT 

<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 

<222> (8) . . . (458) 

<223> Glycosyl hydrolase family 1 

<220> 

<221> SITE 

<222> (16) . . . (30) 

<223> Glycosyl hydrolases family 1 N- terminal signature. Prosite id = 
PS00653 

<220> 

<221> SITE 

<222> (366) . . . (374) 

<223> Glycosyl hydrolases family 1 active site, Prosite id = PS00572 
<400> 4 

Met Ser Ala Pro Ser Pro Ala Arg Pro Val Ser Phe Pro Pro Arg Phe 
1 5 10 ^ 15 

Val Trp Gly Ala Ala Ala Ala Ser Tyr Gin lie Glu Gly Ala Val Arg 
20 25 30 



Glu Asp Gly Lys Gly Pro Ser Val Trp Asp Met Phe Cys Glu Lys Pro 
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35 



40 



45 



Gly Ala Val Phe Glu Gly His Asp Gly Ala Val Ala Cys Asp His Tyr 
50 55 60 



His Arg Tyr Arg Glu Asp Val Ala Leu Met Arg Gin lie Gly Leu Gin 
65 70 75 80 



Ala Tyr Arg Leu Ser Val Cys Trp Pro Arg Val Leu Pro Glu Gly Thr 
85 90 95 



Gly Gin Pro Asn Glu Lys Gly Leu Asp Phe Tyr Ser Arg Leu Val Asp 
100 105 110 



Ala Leu Leu Glu Ala Gly lie Thr Pro Trp Val Thr Leu Phe His Trp 
115 120 125 



Asp Tyr Pro Leu Ala Leu Tyr His Arg Gly Gly Trp Leu Asn Arg Asp 
130 135 140 



Ser Ser Asp Trp Phe Gly Glu Tyr Ala Gly Leu lie. Ala Glu Arg Leu 
145 150 155 160 



Ser Asp Arg Val Ser His Phe Phe Thr Gin Asn Glu Pro Gin Val Tyr 
165 170 175 



lie Gly Phe Gly His Leu Glu Gly. Lys His Ala Pro Gly Asp Thr Leu 
180 185 190 



Pro Leu Ser Gin Met Leu Leu Ala Gly His His Ser Leu Leu Ala His 
195 200 205 



Gly Lys Ala Val Gin Ala Leu Arg Ala His Gly Lys Gin Gin Leu Arg 
210 215 220 



Val Gly Tyr Ala Pro Val Gly Met Pro Leu His Pro Val Ser Glu Ser 
225 230 235 240 



Ala Glu Asp Val Ala Ala Ala Arg Thr Ala Thr Phe Arg Val Arg Glu 
245 250 255 



Lys Asn Ser Trp Asn Asn Ala Trp Trp Met Asp Pro Val Tyr Leu Gly 
260 265 270 



Glu Tyr Pro Ala Gin Gly Leu Glu Phe Tyr Gly Arg Asp Val Pro Ala 
275 280 285 
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He Arg Ser Gly Asp Met Glu Leu He Arg Gin Pro Leu Asp Phe Phe 
290 295 300 



Gly Val Asn He Tyr Gin Ser Thr Pro Val Arg Ala Ala Gly Ala Pro 
305 310 315 320 



Gin Gly Phe Glu Val Val Arg His Pro Thr Gly His Pro He Thr Ala 
325 330 335 



Phe Asn Trp Pro Val Thr Pro Gin Ala Leu Tyr Trp Gly Pro Arg Phe 
340 345 350 



Phe Tyr Glu Arg Tyr Gly Lys Pro He Val He Thr Glu Asn Gly Leu 
355 360 365 



Ser Cys Arg Asp Val He Ala Leu Asp Gly Lys Val His Asp Pro Ser 
370 375 380 



Arg He Asp Phe Thr Thr Arg Tyr Leu Arg Glu Leu His Arg Ala He 
385 390 395 400 



Ala Glu Gly Asn Glu Val Glu Gly Tyr Phe His Trp Ser He Met Asp 
405 410 415 



Asn Phe Glu Trp Ala Ala Gly Tyr Arg Glu Arg Phe Gly Leu Val His 
420 425 430 



Val Asp Tyr Glu Thr Leu Val Arg Thr Pro Lys Asp Ser Ala Ala Trp 
435 440 445 



Tyr Arg Gin Val He Gin Ser Asn Gly Ala Val Leu Phe Asp 
450 455 460 



<210> 5 
<211> 1098 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 5 

atgactcgga ggtctatcgt gcgttcttct tccaacaagt ggcttgtcct tgccggtgcg 60 
gcgctgctcg cctgcaccgc cctcgggtgc aagaaaaaag gcgagagcgg tgacgtcgcc 120 
tcggccccgg ggcaggccca ggcgggcggc aagcagccgt ttcccgacga tgcgccgatc 180 
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accgaaccgc 


ccgctccgcc 


ccctcgtagc 


ggcaatcctc 


tggtgggcgc 


caagctcttc 


240 


gtcgacccgg 


aatctttggc 


catgttgcag 


gcgaacaagc 


tgcggcgcac 


cgacccggag 


300 


aaggcggcga 


ttttggatcg 


catcgcccag 


cagccccagg ctttgtggat 


gggcgagtgg 


360 


aacacgaaca 


hcttccgcgc 


ggtcgagcat 


ttcgtggctc 


gcgccaaggc 


ggagggcgcc 


420 


gtgcccgtca 


tgatcgccta 


caacatcccc 


caccgcgact 


gcgggcagta 


ctctcagggt 


480 


gggctttcct 


ccaaggaggc 


ttaccagcgc 


tggattcgga acgtcgccgc 


ggggattggc 


540 


agcgatgcag 


cggtcgtcgt 


gctcgagccc 


gacgcgctcg 


gccacttcca 


ggagtgtttg 


600 


accgaggagc 


agagcgccga 


gcgcatgttc 


ctgctcagcg 


acgccgtcaa 


ggtgctgcgc 


660 


caaaatccga 


agacggccgt 


gtacctggat 


gccgggcacg 


cgcgctgggt 


gccggtggag 


720 


gagatggccg 


agcgcctcaa 


gctcgcgggc 


atcgagcacg 


cccatggctt 


ttcgctcaac 


780 


acctcgaact 


acgtgggcac 


cgaggagaac 


gccgcttacg gccacaagct 


cgtcgaggcc 


840 


ctgggtggga 


acgtgcgctt 


cgtcatcgac 


acgagccgca 


atggggcggg 


cccctacgag 


900 


gaggccaaga 


acgccgagga 


gagctggtgc 


aacccgcccg 


gtcgcaagat 


cggcaagccg 


960 


ccgaccaccg 


agacggggga 


tcccctcatc 


gacggattcc 


tttggctgaa 


gcgcccgggc 


1020 


gagtcggacg 


gtcagtgcaa 


cggcgggccc 


aaggccggtg 


tgttctggct 


ggagcaggct 


1080 


ctccagcagg 


cccagtaa 










1098 



<210> 6 
<211> 365 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAL 
<222> (1) . . . (29) 

<220> 

<221> DOMAIN 

<222> (81) . . . (358) 

<223> Glycosyl hydrolases family 6 

<220> 

<221> SITE 

<222> (187) . . , (196) 

<223> Glycosyl hydrolases family 6 signature 2. Prosite id = PS00656 
<220> 

<221> SITE 

<222> (263) . , . (266) 

<223> N-glycosylation site. Prosite id = PSOOOOl 



<400> 6 

Met Thr Arg Arg Ser lie Val Arg Ser Ser Ser Asn Lys Trp Leu Val 
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Leu Ala Gly Ala Ala Leu Leu Ala Cys Thr Ala Leu Gly Cys Lys Lys 
20 25 30 



Lys Gly Glu Ser Gly Asp Val Ala Ser Ala Pro Gly Gin Ala Gin Ala 
35 40 45 - 



Gly Gly Lys Gin Pro Phe Pro Asp Asp Ala Pro lie Thr Glu Pro Pro 
50 55 60 



Ala Pro Pro Pro Arg Ser Gly Asn Pro Leu Val Gly Ala Lys Leu Phe 
65 70 75 80 



Val Asp Pro Glu Ser Leu Ala Met Leu Gin Ala Asn Lys Leu Arg Arg 
85 90 95 



Thr Asp Pro Glu Lys Ala Ala lie Leu Asp Arg lie Ala Gin Gin Pro 
100 105 110 



Gin Ala Leu Trp Met Gly Glu Trp Asn Thr Asn lie Phe Arg Ala Val 
115 120 125 



Glu His Phe Val Ala Arg Ala Lys Ala Glu Gly Ala Val Pro Val Met 
130 135 140 



lie Ala Tyr Asn lie Pro His Arg Asp Cys Gly Gin Tyr Ser Gin Gly 
145 150 155 160 



Gly Leu Ser Ser Lys Glu Ala Tyr Gin Arg Trp lie Arg Asn Val Ala 
165 170 175 



Ala Gly lie Gly Ser Asp Ala Ala Val Val Val Leu Glu Pro Asp Ala 
180 185 190 



Leu Gly His Phe Gin Glu Cys Leu Thr Glu Glu Gin Ser Ala Glu Arg 
195 200 205 



Met Phe Leu Leu Ser Asp Ala Val Lys Val Leu Arg Gin Asn Pro Lys 
210 215 220 



Thr Ala Val Tyr Leu Asp Ala Gly His Ala Arg Trp Val Pro Val Glu 
225 230 235 240 



Glu Met Ala Glu Arg Leu Lys Leu Ala Gly He Glu His Ala His Gly 
245 250 255 
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Phe Ser Leu Asrx Thr Ser Asn Tyr Val Gly Thr Glu Glu Asn Ala Ala 
260 265 270 



Tyr Gly His Lys Leu Val Glu Ala Leu Gly Gly Asn Val Arg Phe Val 
275 280 285 



lie Asp Thr Ser Arg Asn Gly Ala Gly Pro Tyr Glu Glu Ala Lys Asn 
290 295 300 



Ala Glu Glu Ser Trp Cys Asn Pro Pro Gly Arg Lys lie Gly Lys Pro 
305 310 315 320 



Pro Thr Thr Glu Thr Gly Asp Pro Leu He Asp Gly Phe Leu Trp Leu 
325 330 335 



Lys Arg Pro Gly Glu Ser Asp Gly Gin Cys Asn Gly Gly Pro Lys Ala 
340 345 350 



Gly Val Phe Trp Leu Glu Gin Ala Leu Gin Gin Ala Gin 
355 360 365 



<210> 7 
<211> 2649 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 7 



atgcaaggaa 


agaaaattga 


tttcattaac tcaaggttgt 


tagttcctga 


ttatccaatc 


60 


gttcccttca 


ttgagggaga 


tggtaccggc cctgatatct 


ggcgtgcttc 


agtcagggtg 


120 


ctggatgttg 


ctgttgacag 


ggcatattcc ggcaagcgaa 


aacttctctg 


gaaagaggtg 


180 


ctggctggcg 


aaaaggcatt 


tacaaatacc gggtcctggc 


ttccggagga 


aactcttaga. 


240 


gcatttcgtg 


aatatcatgt 


tggaattaaa gggccactca 


ctacgccagt 


tggtggggga 


300 


attcgttctc 


tcaatgtagc 


cctcaggcaa gagcttgact 


tgtatgtttg 


cctgaggcca 


360 


gtcaaatggt 


ttaagggtgt 


accaagtcct ctaaaagatc 


cttccaaagt 


ggatatgcat 


420 


attttccgcg 


aaaacactga 


agatatttat gcaggtattg aatttatgca 


tggtgaaccg 


480 


gaggccctga 


aagttaagaa 


atttcttacc gaagaaatgg 


gaatcaagaa 


gtttcggttt 


540 


cccgatacat 


cctccattgg 


tatcaagcct atctcactcg 


aaggaacaga 


gcgtcttgta 


600 


agagcttcca 


ttcaatatgc 


acttgacagg aagttgcctt 


ccgtaacatt 


ggttcataaa 


660 
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ggcaatatca 


tgaaattcac 


cgagggggca 


ttcaaaaaat 


ggggttatga 


acttgccgaa 


720 


agagaatttg 


gcgacagggt 


ttttacatgg 


tcaatgtatg 


accgtatcgc 


cgatgaacat 


780 


ggaacggaag 


aagctggcaa 


agtgcaatcc 


gaagcgattg 


caaaaggtaa 


actcctgata 


840 


aaggatgtga 


ttgctgatgc 


ttttctgcag 


caaatactac 


tcaggcctgc 


cgagtacagc 


900 


gttatcgcaa 


ccatgaacct 


gaatggcgat 


tatatcagcg 


atgcactggc 


agctatggtg 


960 


gggggtatag 


gaattgctcc 


cggagccaat 


attaaccatc 


aaactggcca 


tgcagtcttt 


1020 


gaagcaacac 


acggcacggc 


tcccaaatat 


gccaaccttg 


atcaggtaaa 


ccctggctca 


1080 


gtaatactaa 


gtggcgcgct 


gatgctcgaa 


tacatgggct 


ggaacgaagc 


cgctcagctc 


1140 


attaccaatg 


gattggaggc 


taccattcaa 


cagaaactgg 


taacctatga 


tttccatcgc 


1200 


ttaatggaag 


gtgctacaaa 


gttgaagact 


tcagaatttg 


gcgatgctgt 


gatccggccg 


1260 


gcacgttccg 


cctgggcgga 


cacggctgcc 


gatgccctct 


ccgggcggcg 


gcgtcgtgcg 


1320 


cggaacggcg 


ggcttgttgc 


cccgcccgcg 


gcctgtcgcc 


gggggcgggt 


acgggactca 


1380 


gcgcttgcgc 


gcctccttca 


gggtggactg 


cagggcgaag 


aaggccggct 


tgcggacgaa 


1440 


cttctccgtc 


atgaccgtgg 


cgctgccctc 


accctcgaag 


aagaccggca 


cccacgagta 


1500 


cttgtcggtg 


aagccccaga 


tggtgaagga 


gttgcagtcg 


ttcacggcca 


ggcaggccga 


1560 


cagtgcctgc 


tggtagtagt 


cggcctgctg 


ccgcagctgc 


tccttggtgg 


gcttgccgct 


1620 


cgccgggagg 


tccatgcgga 


cgtcgatctc 


ggtgatggcg 


gtctccagac 


cgaggtcggc 


1680 


gaaccgctgc 


aggttctgct 


gcaggtcgcc 


cgggaagccg 


tagcgggtgc 


tcaggtggcc 


1740 


ctgggcgccg 


aatccgtgga 


gcggcacgcc 


ctgctccagc 


atctcctggg 


cgagctcgta 


1800 


gtaggcgtcg 


ctcttggcgt 


tgatgccctc 


gacgttgtag 


tcgttgagga 


acagcttggc 


1860 


ctcggggtcg 


gcctcgtggg 


cccagcggaa 


ggcgtccgcg 


acgatctccg 


ggccgagctc 


1920 


acgtatccag 


atgttctcgt 


cggtgcgcag 


ctcggcctgg 


tcgttgaaga 


tctcgttggc 


1980 


cacgtcccac 


tgctggatct 


tgccggcgta 


gcggccgacg 


accgtgtcga 


tgtggtcctt 


2040 


gaggatggcg 


cgcagttcct 


ccttggtgaa 


gtcgccctcc 


tccagccatt 


cggggttctg 


2100 


gctgtgccac 


aggagggtgt 


gcccgcgcac 


ggcctggcgg 


ttccgctggg 


cgaactcgac 


2160 


gatggcgtcg 


gcctcctcga 


agcggtactg 


gtcgcgctcg 


gggtggatga 


actcccactt 


2220 


catctggttc 


tcggcggaga 


ccgagttgaa 


ctgctggccc 


aggatcttcc 


ggtacttctt 


2280 


gfccgaaggtg 


aaggggtccg 


ggtagtcctg 


ttcgaggtgg 


tggccgccgc 


cggccgccgc 


2340 


ggagcctatg 


aagaaccctt 


cgggggcggc 


ccagcgcagg 


cggtcgaact 


tggcgttgga 


2400 


gtggggcgcg 


gcctcgtggt 


cggcggacgg 


cttggccgtg 


gccgtcgacg 


tcaccagcgg 


2460 


gacggccagc 


gcggcggcga 


gagcaaaggt 


gacgatgcgg 


SLcggatctca 


tcagaggtcc 


2520 
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ctcattcgat cgcggctccg aaagttttcg gaggattacc ggaatgtttc agggacctta 2580 

aggcgcccgg agccgggtcg tcaacggttt ggcccggccc ggtcgaagct tctcccgacc 2640 

aggcgttga 2649 

<210> 8 
<211> 882 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (20) . . . (417) 

<223> Isocitrate/isopropylmalate dehydrogenase 
<220> 

<221> SITE 

<222> (310) . . . (329) 

<223> Isocitrate and isopropylmalate dehydrogenases signature. Prosite id 
PS00470 

<220> 

<221> SITE 

<222> (868) . . . (871) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 8 

Met Gin Gly Lys Lys He Asp Phe He Asn Ser Arg Leu Leu Val Pro 
15 10 15 



Asp Tyr Pro He Val Pro Phe He Glu Gly Asp Gly Thr Gly Pro Asp 
20 • 25 ■ 30 



He Trp Arg Ala Ser Val Arg Val Leu Asp Val Ala Val Asp Arg Ala 
35 40 45 



Tyr Ser Gly Lys Arg Lys Leu Leu Trp Lys Glu Val Leu Ala Gly Glu 
50 55 60 



Lys Ala Phe Thr Asn Thr Gly Ser Trp Leu Pro Glu Glu Thr Leu Arg 
65 70 75 80 



Ala Phe Arg Glu Tyr His Val Gly He Lys Gly Pro Leu Thr Thr Pro 
85 90 95 



Val Gly Gly Gly He Arg Ser Leu Asn Val Ala Leu Arg Gin Glu Leu 
lOO 105 110 



Asp Leu Tyr Val Cys 



Leu Arg Pro Val Lys Trp Phe Lys Gly Val Pro 
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115 120 125 



Ser Pro Leu Lys Asp Pro Ser Lys Val Asp Met His He Phe Arg Glu 
130 135 140 



Asn Thr Glu Asp He Tyr Ala Gly He Glu Phe Met His Gly Glu Pro 
145 150 155 160 



Glu Ala Leu Lys Val Lys Lys Phe Leu Thr Glu Glu Met Gly He Lys 
165 170 175 



Lys Phe Arg Phe Pro Asp Thr Ser Ser He Gly He Lys Pro He Ser 
180 185 190 



Leu Glu Gly Thr Glu Arg Leu Val Arg Ala Ser He Gin Tyr Ala Leu 
195 200 205 



Asp Arg Lys Leu Pro Ser Val Thr Leu Val His Lys Gly Asn He Met 
210 215 220 



Lys Phe Thr Glu Gly Ala Phe Lys Lys Trp Gly Tyr Glu Leu Ala Glu 
225 230 235 240 



Arg Glu Phe Gly Asp Arg Val Phe Thr Trp Ser Met Tyr Asp Arg He 
245 250 255 



Ala Asp Glu His Gly Thr Glu Glu Ala Gly Lys Val Gin Ser Glu Ala 
260 265 270 



He Ala Lys Gly Lys Leu Leu He Lys Asp Val He Ala Asp Ala Phe 
275 280 285 



Leu Gin Gin He Leu Leu Arg Pro Ala Glu Tyr Ser Val He Ala Thr 
290 295 300 



Met Asn Leu Asn Gly Asp Tyr He Ser Asp Ala Leu Ala Ala Met Val 
305 310 315 320 



Cly Gly He Gly He Ala Pro Gly Ala Asn He Asn His Gin Thr Gly 
325 330 335 



His Ala Val Phe Glu Ala Thr His Gly Thr Ala Pro Lys Tyr Ala Asn 
340 345 350 



Leu Asp Gin Val Asn Pro Gly Ser Val He Leu Ser Gly Ala Leu Met 
355 360 365 
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Leu Glu Tyr Met Gly Trp Asn Glu Ala Ala Gin Leu lie Thr Asn Gly 
370 375 380 



Leu Glu Ala Thr lie Gin Gin Lys Leu Val Thr Tyr Asp Phe His Arg 
385 390 395 400 



Leu Met Glu Gly Ala Thr Lys Leu Lys Thr Ser Glu Phe Gly Asp Ala 
405 410 415 



Val lie Arg Pro Ala Arg Ser Ala Trp Ala Asp Thr Ala Ala Asp Ala 
420 425 430 



Leu Ser Gly Arg Arg Arg Arg Ala Arg Asn Gly Gly Leu Val Ala Pro 
435 440 445 



Pro Ala Ala Cys Arg Arg Gly Arg Val Arg Asp Ser Ala Leu Ala Arg 
450 455 460 



Leu Leu Gin Gly Gly Leu Gin Gly Glu Glu Gly Arg Leu Ala Asp Glu 
465 470 475 480 



Leu Leu Arg His Asp Arg Gly Ala Ala Leu Thr Leu Glu Glu Asp Arg 
485 490 495 



His Pro Arg Val Leu Val Gly Glu Ala Pro Asp Gly Glu Gly Val Ala 
500 505 510 



Val Val His Gly Gin Ala Gly Arg Gin Cys Leu Leu Val Val Val Gly 
515 520 525 



Leu Leu Pro Gin Leu Leu Leu Gly Gly Leu Ala Ala Arg Arg Glu Val 
530 535 540 



His Ala Asp Val Asp Leu Gly Asp Gly Gly Leu Gin Thr Glu Val Gly 
545 550 555 560 



Glu Pro Leu Gin Val Leu Leu Gin Val Ala Arg Glu Ala Val Ala Gly 
565 570 575 



Ala Gin Val Ala Leu Gly Ala Glu Ser Val Glu Arg His Ala Leu Leu 
580 585 590 



Gin His Leu Leu Gly Glu Leu Val Val Gly Val Ala Leu Gly Val Asp 
595 600 605 
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Ala Leu Asp Val Val Val Val Glu Glu Gin Leu Gly Leu Gly Val Gly 
610 615 620 



Leu Val Gly Pro Ala Glu Gly Val Arg Asp Asp Leu Arg Ala Glu Leu 
625 630 635 640 



Thr Tyr Pro Asp Val Leu Val Gly Ala Gin Leu Gly Leu Val Val Glu 
645 650 655 



Asp Leu Val Gly His Val Pro Leu Leu Asp Leu Ala Gly Val Ala Ala 
660 665 670 



Asp Asp Arg Val Asp Val Val Leu Glu Asp Gly Ala Gin Phe Leu Leu 
675 680 685 



Gly Glu Val Ala Leu Leu Gin Pro Phe Gly Val Leu Ala Val Pro Gin 
690 695 700 



Glu Gly Val Pro Ala His Gly Leu Ala Val Pro Leu Gly Glu Leu Asp 
705 710 715 720 



Asp Gly val Gly Leu Leu Glu Ala Val Leu Val Ala Leu Gly Val Asp 
725 730 735 



Glu Leu Pro Leu His Leu Val Leu Gly Gly Asp Arg Val Glu Leu Leu 
740 745 750 



Ala Gin Asp Leu Pro Val Leu Leu Val Glu Gly Glu Gly Val Arg Val 
755 760 765 



Val Leu Phe Glu Val Val Ala Ala Ala Gly Arg Arg Gly Ala Tyr Glu 
770 775 780 



Glu Pro Phe Gly Gly Gly Pro Ala Gin Ala Val Glu Leu Gly Val Gly 
785 790 795 800 



Val Gly Arg Gly Leu Val Val Gly Gly Arg Leu Gly Arg Gly Arg Arg 
805 810 815 



Arg His Gin Arg Asp Gly Gin Arg Gly Gly Glu Ser Lys Gly Asp Asp 
820 825 830 



Ala Asp Gly Ser His Gin Arg Ser Leu He Arg Ser Arg Leu Arg Lys 
835 840 845 
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Phe Ser Glu Asp Tyr Arg Asn Val Ser Gly Thr Leu Arg Arg Pro Glu 
850 855 860 



Pro Gly Arg Gin Arg Phe Gly Pro Ala Arg Ser Lys Leu Leu Pro Thr 
865 870 875 880 



Arg Arg 



<210> 9 
<211> 1134 
<212> DNA 
<213> UnkxLown 



<220> 

<223> Obtained from environmental sample 
<400> 9 



atgagatccg 


tccgcatcgt 


cacctttgct 


ctcgccgccg 


cgctggccgt 


cccgctggtg 


60 


acgtcgacgg 


ccacggccaa 


gccgtccgcc 


gaccacgagg 


ccgcgcccca 


ctccaacgcc 


120 


aagttcgacc 


gcctgcgctg 


ggccgccccc 


gaagggttct 


tcataggctc 


cgcggcggcc 


180 


ggcggcggcc 


accacctcga 


acaggactac 


ccggacccct 


tcaccttcga 


caagaagtac 


240 


cggaagatcc 


tgggccagca 


gttcaactcg 


gtctccgccg 


agaaccagat 


gaagtgggag 


300 


ttcatccacc 


ccgagcgcga 


ccagtaccgc 


ttcgaggagg 


ccgacgccat 


cgtcgagttc 


360 


gcccagcgga 


accgccaggc 


cgtgcgcggg 


cacaccctcc 


tgtggcacag 


ccagaacccc 


420 


gaatggctgg 


aggagggcga 


cttcaccaag 


gaggaactgc 


gcgccatcct 


caaggaccac 


480 


atcgacacgg 


tcgtcggccg 


ctacgccggc 


aagatccagc 


agtgggacgt 


ggccaacgag 


540 


atcttcaacg 


accaggccga 


gctgcgcacc 


gacgagaaca 


tctggatacg 


tgagctcggc 


600 


ccggagatcg 


tcgcggacgc 


cttccgctgg 


gcccacgagg 


ccgaccccga 


ggccaagctg 


660 


ttcctcaacg 


actacaacgt 


cgagggcatc 


aacgccaaga 


gcgacgccta 


ctacgagctc 


720 


gcccaggaga 


tgctggagca 


gggcgtgccg 


ctccacggat 


tcggcgccca 


gggccacctg 


780 


agcacccgct 


acggcttccc 


gggcgacctg 


cagcagaacc 


tgcagcggtt 


cgccgacctc 


840 


ggtctggaga 


ccgccatcac 


cgagatcgac 


gtccgcatgg 


acctcccggc 


gagcggcaag 


900 


cccaccaagg 


agcagctgcg 


gcagcaggcc 


gactactacc 


agcaggcact 


gtcggcctgc 


960 


ctggccgtga 


acgactgcaa 


ctccttcacc 


atctggggct 


tcaccgacaa 


gtactcgtgg 


1020 


gtgccggtct 


tcttcgaggg 


tgagggcagc 


gccacggtca 


tgacggagaa 


gttcgtccgc 


1080 


aagccggcct 


tcttcgccct 


gcagtccacc 


ctgaaggagg 


cgcgcaagcg 


ctga 


1134 



<210> 10 
<211> 377 
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<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sainple 
<220> 

<221> SIGNAL 
<222> (1) . . . (26) 

<220> 

<221> DOMAIN 

<222> (44) . . . (371) 

<223> Glycosyl hydrolase family 10 

<400> 10 

Met Arg Ser Val Arg He Val Thr Phe Ala Leu Ala Ala Ala Leu Ala 
15 10 15 



Val Pro Leu Val Thr Ser Thr Ala Thr Ala Lys Pro Ser Ala Asp His 
20 25 30 



Glu Ala Ala Pro His Ser Asn Ala Lys Phe Asp Arg Leu Arg Trp Ala 
35 40 45 



Ala Pro Glu Gly Phe Phe He Gly Ser Ala Ala Ala Gly Gly Gly His 
50 55 60 



His Leu Glu Gin Asp Tyr Pro Asp Pro Phe Thr Phe Asp Lys Lys Tyr 
65 70 75 80 



Arg Lys He Leu Gly Gin Gin Phe Asn Ser Val Ser Ala Glu Asn Gin 
85 90 95 



Met Lys Trp Glu Phe He His Pro Glu Arg Asp Gin Tyr Arg Phe Glu 
100 105 110 



Glu Ala Asp Ala He Val Glu Phe Ala Gin Arg Asn Arg Gin Ala Val 
115 120 125 



Arg Gly His Thr Leu Leu Trp His Ser Gin Asn Pro Glu Trp Leu Glu 
130 135 140 



Glu Gly Asp Phe Thr Lys Glu Glu Leu Arg Ala He Leu Lys Asp His 
145 150 155 160 



He Asp Thr Val Val Gly Arg Tyr Ala Gly Lys He Gin Gin Trp Asp 
165 170 175 



Val Ala Asn Glu He Phe Asn Asp Gin Ala Glu Leu Arg Thr Asp Glu 
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180 185 190 



Asn He Trp He Arg Glu Leu Gly Pro Glu He Val Ala Asp Ala Phe 
195 200 205 



Arg Trp Ala His Glu Ala Asp Pro Glu Ala Lys Leu Phe Leu Asn Asp 
210 215 220 



Tyr Asn Val Glu Gly He Asn Ala Lys Ser Asp Ala Tyr Tyr Glu Leu 
225 230 235 240 



Ala Gin Glu Met Leu Glu Gin Gly Val Pro Leu His Gly Phe Gly Ala 
245 250 255 



Gin Gly His Leu Ser Thr Arg Tyr Gly Phe Pro Gly Asp Leu Gin Gin 
260 265 270 



Asn Leu Gin Arg Phe Ala Asp Leu Gly Leu Glu Thr Ala He Thr Glu 
275 280 285 



He Asp Val Arg Met Asp Leu Pro Ala Ser Gly Lys Pro Thr Lys Glu 
290 295 300 



Gin Leu Arg Gin Gin Ala Asp Tyr Tyr Gin Gin Ala Leu Ser Ala Cys 
305 310 315 320 



Leu Ala Val Asn Asp Cys Asn Ser Phe Thr He Trp Gly Phe Thr Asp 
325 330 335 



Lys Tyr Ser Trp Val Pro Val Phe Phe Glu Gly Glu Gly Ser Ala Thr 
340 345 350 



Val Met Thr Glu Lys Phe Val Arg Lys Pro Ala Phe Phe Ala Leu Gin 
355 360 365 



Ser Thr Leu Lys Glu Ala Arg Lys Arg 
370 375 



<210> 11 
<211> 1080 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 



<400> 11 

atgccctgga gctcatcaac gggacctgca cctatgacga gtaacccgcc cctcaaacgc 



60 
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cccctgcgta 


tcggtctggt 


cggcacgggc 


atcggctcac 


tgcacgccgc 


cggaatttcc 


120 


cggatgcctc 


agcttgccac 


gctgggggcc 


atctgtgggc 


ttgataccca 


cgccgtgaat 


180 


gccctagcca 


cacgctacgg 


ggtagaaaaa 


accacatctc 


gctatgagga 


tttactgaac 


240 


gatcccggcc 


ttgatgtcat 


cgatctgtgc 


gttcctcacg 


atgaacacat 


gcccatggcc 


300 


attgccgccg 


cccgggccgg 


aaaacatctc 


ctcatcgaaa 


aacctttggc 


ccgcaccctg 


360 


gaagaggccg 


atgcaatcct 


cgaggccgtg 


aaaagcgccg 


gtgtaacgct 


gatgatggga 


420 


cacaaccagc 


gttactacgc 


ccatcacgcc 


agggctaaag 


cattggtcga 


cgccggggtc 


480 


atcggaaaac 


cctacatgat 


cgtagcttcg 


gttcatgtgc 


acgggcagat 


tgatggtttt 


540 


cgccgctttc 


ttaagcacgc 


cgggggtggc 


acgttgatcg 


attcgggagt 


gcaccgcttc 


600 


gacctcattc 


gctggatcat 


gggtgaagtc 


gagaccgtct 


tcgctcaaac 


gggtcgcttc 


660 


ctccagatgc 


aaatggaagg 


agaagactgc 


gcggtggtca 


ccctccgctt 


ccgcagcgga 


720 


gccatcggga 


gcttctcatg 


cagctggagc 


gccaaaggcc 


ctgttccaga 


agaaacattg 


780 


caaattttcg 


gcccctatgg 


ttcgatttat 


accgaagacc 


acacccgcac 


cttacgcctt 


840 


tacaccgaaa 


gacccacccc 


cgaactggaa 


gacgtaaggc 


agtttgtctt 


cccggtcgat 


900 


caggctgagt 


ccatccgccg 


catgattgaa 


gcgcacttca 


ccagcctgca 


acaggggtta 


960 


ccccctccga 


tcaccggtat 


ggacggacgc 


gcttcccttg 


agctcagcat 


ggcctcctat 


1020 


cgctcggctc 


aaaccggcca 


gcctgttcat 


cttccccttc 


agagaggaaa 


ccagaaatga 


1080 



<210> 12 
<211> 359 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (22) . . . (141) 

<223> Oxidoreductase family, NAD-binding Rossmann fold 
<220> 

<221> DOMAIN 

<222> (153) . . . (260) 

<223> Oxidoreductase family, C- terminal alpha/beta domain 
<400> 12 

Met Pro Trp Ser Ser Ser Thr Gly Pro Ala Pro Met Thr Ser Asn Pro 
15 10 15 



Pro Leu Dys Arg Pro Leu Arg He Gly Leu Val Gly Thr Gly He Gly 
20 25 30 
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Ser Leu His Ala Ala Gly lie Ser Arg Met Pro Gin Leu Ala Thr Leu 
35 40 45 



Gly Ala lie Cys Gly Leu Asp Thr His Ala Val Asn Ala Leu Ala Thr 
50 55 60 



Arg Tyr Gly Val Glu Lys Thr Thr Ser Arg Tyr Glu Asp Leu Leu Asn 
65 70 75 80 



Asp Pro Gly Leu Asp Val He Asp Leu Cys Val Pro His Asp Glu His 
85 90 95 



Met Pro Met Ala He Ala Ala Ala Arg Ala Gly Lys His Leu Leu He 
100 105 110 



Glu Lys Pro Leu Ala Arg Thr Leu Glu Glu Ala Asp Ala He Leu Glu 
115 120 125 



Ala Val Lys Ser Ala Gly Val Thr Leu Met Met Gly His Asn Gin Arg 
130 135 140 



Tyr Tyr Ala His His Ala Arg Ala Lys Ala Leu Val Asp Ala Gly Val 
145 150 155 160 



He Gly Lys Pro Tyr Met He Val Ala Ser Val His Val His Gly Gin 
165 170 175 



He Asp Gly Phe Arg Arg Phe Leu Lys His Ala Gly Gly Gly Thr Leu 
180 185 190 



He Asp Ser Gly Val His Arg Phe Asp Leu He Arg Trp He Met Gly 
195 200 205 



Glu Val Glu Thr Val Phe Ala Gin Thr Gly Arg Phe Leu Gin Met Gin 
2i0 215 220 



Met Glu Gly Glu Asp Cys Ala Val Val Thr Leu Arg Phe Arg Ser Gly 
225 230 235 240 



Ala He Gly Ser Phe Ser Cys Ser Trp Ser Ala Lys Gly Pro Val Pro 
245 250 255 



Glu Glu Thr Leu Gin He Phe Gly Pro Tyr Gly Ser He Tyr Thr Glu 
260 265 270 
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Asp His Thr Arg Thr Leu Arg Leu Tyr Thr Glu Arg Pro Thr Pro Glu 
275 280 285 

Leu Glu Asp Val Arg Gin Phe Val Phe Pro Val Asp Gin Ala Glu Ser 
290 295 300 

He Arg Arg Met He Glu Ala His Phe Thr Ser Leu Gin Gin Gly Leu 
305 310 315 320 

Pro Pro Pro He Thr Gly Met Asp Gly Arg Ala Ser Leu Glu Leu Ser 
325 330 335 

Met Ala Ser Tyr Arg Ser Ala Gin Thr Gly Gin Pro Val His Leu Pro 
340 345 350 

Leu Gin Arg Gly Asn Gin Lys 
355 

<210> 13 
<211> 1038 
<2a2> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 13 

atgagcccgg tgcgcgttgc tgtcatcggc gccgggcaaa ttgcccagcg cgggcattta 60 

cccgggcttc tggaagctgg cgccgaaatt accgttctgt gcgataattc ccttcctcag 120 

cttgaagaaa ttggggccaa atttcacgtt caccgggtct accgcgactg gcacgccatg 180 

ctggatgccg gcggattcga agccgtcacc atttgtaccc cgcccttcct ccatgccgag 240 

atggccatcg aatgtgcccg cagagggttg catgtactgg tagaaaaacc catggctgta 300 

aatctccaac aatgcgatca aatgatcgcc gcgtctgaac aggccggaac catcttaatg 360 

gtctcgcata accagcgctt tatggaggca catcgtctgg ccaaagaaat ccttgatgcc 420 

ggcctcctcg gcaggctcta cctggcgcac ggggtctttg gccacggcgg cccggaggtt 480 

tggagcccaa cccagcaatg gtacttccga cctgaccgcg ccggcgctgg cgtgatcgct 540 

gacctggggt atcataaact tgacctgatc cgctggctca ccgggcaaga aattaccgcg . 600 

gtgggagcac tgggcgccac ctttgaaaag caaacctcgc ttgaagactc tgctgtgatg 660 

ctggttcacc tttcggaggg tactctcgcc accatccagg taagctgggt gttcaggcct 720 

gactgggaaa acagcctggt ccttcgagga gaacgggggg tgctcgccat ccccactgat 780 

gcctcgcaac ccctgcgggt ctcttacata tcttcttcgg gtcaggtcat tgaaagtacg 840 
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catcgttgcg actccggcga tacctccggc tggttcggag cgatccgggc atttctcacc 900 

gcgatcgaaa aaagcgctcc cgctcccatt gacggaaaag aagggcgtgc tgtcatggcg 960 

gcagttctgg cggccacacg ctccattcaa aaacatacga tcatttctat aaccgaggta 1020 

gaaaccatcc atgactga 1038 

<210> 14 
<211> 345 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (4) . , . (123) 

<223> Oxidoreductase family, NAD-binding Rossmsmn fold 
<220> 

<221> DOMAIN 
<222> (135) . . . (248) 

<223> Oxidoreductase family, C- terminal alpha/beta domain 
<400> 14 

Met Ser Pro Val Arg Val Ala Val He Gly Ala Gly Gin He Ala Gin 
15 10 15 



Arg Gly His Leu Pro Gly Leu Leu Glu Ala Gly Ala Glu He Thr Val 
20 25 30 



Leu Cys Asp Asn Ser Leu Pro Gin Leu Glu Glu He Gly Ala Lys Phe 
35 40 45 



His Val His Arg Val Tyr Arg Asp Trp His Ala Met Leu Asp Ala Gly 
50 55 60 



Gly Phe Glu Ala Val Thr He Cys Thr Pro Pro Phe Leu His Ala Glu 
65 70 75 80 



Met Ala He Glu Cys Ala Arg Arg Gly Leu His Val Leu Val Glu Lys 
85 90 95 



Pro Met Ala Val Asn Leu Gin Gin Cys Asp Gin Met He Ala Ala Ser 
100 105 110 



Glu Gin Ala Gly Thr He Leu Met Val Ser His Asn Gin Arg Phe Met 
115 120 125 



Glu Ala His Arg Leu Ala Lys Glu He Leu Asp Ala Gly Leu Leu Gly 
130 135 140 
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Arg Leu Tyr Leu Ala His Gly Val Phe Gly His Gly Gly Pro Glu Val 
145 150 155 160 



Trp Ser Pro Thr Gin Gin Trp Tyr Phe Arg Pro Asp Arg Ala Gly Ala 
165 170 175 



Gly Val lie Ala Asp Leu Gly Tyr His Lys Leu Asp Leu lie Arg Trp 
180 185 190 



Leu Thr Gly Gin Glu lie Thr Ala Val Gly Ala Leu Gly Ala Thr Phe 
195 200 205 



Glu Lys Gin Thr Ser Leu Glu Asp Ser Ala Val Met Leu Val His Leu 
210 215 220 



Ser Glu Gly Thr Leu Ala Thr lie Gin Val Ser Trp Val Phe Arg Pro 
225 230 235 240 



Asp Trp Glu Asn Ser Leu Val Leu Arg Gly Glu Arg Gly Val Leu Ala 
245 250 255 



He Pro Thr Asp Ala Ser Gin Pro Leu Arg Val Ser Tyr lie Ser Ser 
260 265 270 



Ser Gly Gin Val He Glu Ser Thr His Arg Cys Asp Ser Gly Asp Thr 
275 280 285 



Ser Gly Trp Phe Gly Ala He Arg Ala Phe Leu Thr Ala He Glu Lys 
290 295 300 



Ser Ala Pro Ala Pro He Asp Gly Lys Glu Gly Arg Ala Val Met Ala 
305 310 315 320 



Ala Val Leu Ala Ala Thr Arg Ser He Gin Lys His Thr He He Ser 
325 330 335 



He Thr Glu Val Glu Thr He His Asp 
340 345 



<210> 15 
<211> 1347 
<212> DNA 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
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<400> 15 



atgactgacc 


atcgttttcc 


aaaaggattc 


atctggggaa 


ccgctacggc gtctttccag 


D u 


attgaaggcg 


ccacccgcga 


agatggccgg 


ggcgaatcca 


tctgggaccg cttctgcgcc 


Ton 


acgccgggga 


aaattgtcac 


gggcgaaacc 


ggcgatcctg 


cctgcgactc ctatcatcgt 


1 DA 


taccctgaag 


acatcgccct 


gatgaaggct 


atgtcgctca 


atggttaccg cttttcaatc 




gcctggcctc 


gcgtcattcc 


tgacggagac 


ggtaaagtct 


gtcaggccgg gctcgactac 


o UU 


tacgatcgtg 


tggtagatgc 


tctcctggcg 


gagaatatcc 


aaccttttat caccctgtac 


o rt 

350 


cactgggacc 


tgccccaggc 


attacaggat 


cggggtggct 


Qgggcaaccg tgccacggtt 


420 


gaggcgttca 


ctcgctacgt 


agatattgtg 


gtttctcgcc 


tgggtgaccg cgtaaagtac 


480 


tggatgacac 


acaacgaacc 


ctggtgtgta 


tccattttga 


gccatgagct tggtgaacat 


540 


gcccccgggt 


tgaaggaccg 


aaaactggcc 


ctccaggtgg 


cgcaccatgt cctcgtttct 


600 


cacggcctgg 


ccgtgcccat 


catccgccag 


cgttgtaaag 


aggcgcaggt tggcatcgtg 


660 


ttgaattttt 


cacctgctta 


cccggccacc 


gatagcctgg 


ccgaccagat ggccacccgt 


720 


cagcaccacg 


cccggtttaa 


cctctggttc 


ctcgatccca 


tcgccgggcg cggctacccg 


780 


caggatgcct 


gggaagggta 


cggagccgat 


gttcccgcca 


tgaggcctga tgacatgcag 


840 


afccatcgccg 


cccccatcga 


cttcctgggc 


gtcaatttct 


acagtcgggc ggtctgccac 


900 


gatccggccg 


ggggcgaagg 


ttcccgggtg 


ctcaatgtgc 


gcagtaaaac cgaggccacc 


960 


gatcgagact 


gggagattta 


ccctcaggcg 


ctctacgatt 


tactcatctg gatccacaat 


1020 


ggataccagt 


tcagagatat 


ttacattacc 


gagaatggcg 


cctcatacaa cgatgtggtc 


1080 


tccccggatg 


ggaaagtgca 


cgatcctaaa 


cgtctggact 


atctgaaacg ccatctggcc 


1140 


atggctctgc 


gggccatcga 


agcgggcgtt 


ccactgcgtg gttatttctg ctggagcttg 


1200 


atggacaact 


tcgaatgggc 


catgggcacc 


agcagccgat 


tcgggttggc ctacaccgac 


1260 


ttcactaccc 


agaagcgtat 


tctcaaagac 


agtgggctct 


ggtttggcga agtggcacgg 


1320 


gcaaacgcct 


taatcgacct 


tccctga 






1347 



<210> 16 
<211> 448 
<212> PRT 
<213> Unknovm 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 

<222> (2) . . . (444) 

<223> Glycosyl hydrolase family 1 



wo 2006/101584 



25/248 



PCT/US2006/002516 



<220> 

<221> SITE 

<222> (10) . . . (24) 

<223> Glycosyl hydrolases family 1 N-terminal signature. Prosite id = 
PS00653 

<220> 

<221> SITE 

<222> (352) . . . (360) 

<223> Glycosyl hydrolases family 1 active site. Prosite id = PS00572 
<400> 16 

Met Thr Asp His Arg Phe Pro Lys Gly Phe lie Trp Gly Thr Ala Thr 
15 10 15 



Ala Ser Phe Gin lie Glu Gly Ala Thr Arg Glu Asp Gly Arg Gly Glu 
20 25 30 



Ser lie Trp Asp Arg Phe Cys Ala Thr Pro Gly Lys lie Val Thr Gly 
35 40 45 



Glu Thr Gly Asp Pro Ala Cys Asp Ser Tyr His Arg Tyr Pro Glu Asp 
50 55 60 



He Ala Leu Met Lys Ala Met Ser Leu Asn Gly Tyr Arg Phe Ser He 
65 70 75 80 



Ala Trp Pro Arg Val He Pro Asp Gly Asp Gly Lys Val Cys Gin Ala 
85 90 95 



Gly Leu Asp Tyr Tyr Asp Arg Val Val Asp Ala Leu Leu Ala Glu Asn 
100 105 110 



He Gin Pro Phe He Thr Leu Tyr His Trp Asp Leu Pro Gin Ala Leu 
115 120 125 



Gin Asp Arg Gly Gly Trp Gly Asn Arg Ala Thr Val Glu Ala Phe Thr 
130 135 140 



Arg Tyr Val Asp He Val Val Ser Arg Leu Gly Asp Arg Val Lys Tyr 
145 150 155 160 



Trp Met Thr His Asn Glu Pro Trp Cys Val Ser He Leu Ser His Glu 
165 170 175 



Leu Gly Glu His Ala Pro Gly Leu Lys Asp Arg Lys Leu Ala Leu Gin 
180 185 190 



Val Ala His His Val Leu Val Ser His Gly Leu Ala Val Pro He He 
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195 200 205 



Arg Gin Arg Cys Lys Glu Ala Gin Val Gly lie Val Leu Asn Phe Ser 
210 215 220 



Pro Ala Tyr Pro Ala Thr Asp Ser Leu Ala Asp Gin Met Ala Thr Arg 
225 230 235 240 



Gin His His Ala Arg Phe Asn Leu Trp Phe Leu Asp Pro lie Ala Gly 
245 250 255 



Arg Gly Tyr Pro Gin Asp Ala Trp Glu Gly Tyr Gly Ala Asp Val Pro 
260 265 270 



Ala Met Arg Pro Asp Asp Met Gin lie He Ala Ala Pro He Asp Phe 
275 280 285 



Leu Gly Val Asn Phe Tyr Ser Arg Ala Val Cys His Asp Pro Ala Gly 
290 295 300 



Gly Glu Gly Ser Arg Val Leu Asn Val Arg Ser Lys Thr Glu Ala Thr 
305 310 315 320 



Asp Arg Asp Trp Glu He Tyr Pro Gin Ala Leu Tyr Asp Leu Leu He 
325 330 335 



Trp He His Asn Gly Tyr Gin Phe Arg Asp He Tyr He Thr Glu Asn 
340 345 350 



Gly Ala Ser Tyr Asn Asp Val Val Ser Pro Asp Gly Lys Val His Asp 
355 360 365 



Pro Lys Arg Leu Asp Tyr Leu Lys Arg His Leu Ala Met Ala Leu Arg 
370 375 380 



Ala He Glu Ala Gly Val Pro Leu Arg Gly Tyr Phe Cys Trp Ser Leu 
385 390 395 400 



Met Asp Asn Phe Glu Trp Ala Met Gly Thr Ser Ser Arg Phe Gly Leu 
405 410 415 



Ala Tyr Thr Asp Phe Thr Thr Gin Lys Arg He Leu Lys Asp Ser Gly 
420 425 430 



Leu Trp Phe Gly Glu Val Ala Arg Ala Asn Ala Leu He Asp Leu Pro 
435 440 445 
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<210> 17 
<211> 1215 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 17 



atgcggtacg 


tgctgatttc 


ctgccttgcg 


ctggcttccc 


tgtgcgcgca 


gcctcttcct 


60 


gtttccacgc 


ctgaaaaaga 


gggcttctcg 


gcggagcgcc 


tcgggcggat 


gcaccggtat 


120 


ttcgagaacc 


tgacgaaaac 


cggagagcgg cctggcgcga 


tcacgctgat 


cgtgcgcaac 


180 


gggcgcatcg 


tggactggcg 


cacgttcggg 


ctgcgcgacg 


tcgagaacaa 


tctgccgatg 


240 


gagaaggaca 


cgatcgtcca 


catctactcg atgacgaagc 


cggtgacgtc 


cgtggccgtg 


300 


atgatgctgg 


tggaggaggg 


caggctggcg 


ctggacgacc 


gggtggacaa 


gttcattccc 


360 


gagttcaagg 


ggatgaaggt 


gtacaagggc 


ggcacggtgg 


agcggccgga 


gctggaggac 


420 


gcggcgcggc 


cgatcacggt 


gaagcatctg 


ctgacgcaca 


cgagcgggct 


gagctacggc 


480 


tggggcaacg 


acaacgtctc 


cgcgatgtac 


cgcaaggccg 


acccgctcgg 


cgcgccgagc 


540 


ctgaaagagt 


ttatcgacag 


gctggtgaaa 


ctgccgctgg 


cattccaccc 


gggcgagcgt 


600 


u d ^ ^ dg U C 


cgatgtcgat 


cgacgtgctg ggctacctgg 


tggaggctgt 


ctccggcgag 


660 


ccgttcgatc 


agttcgtgga 


gaagcggatc 


acggggccgc 


tgaagatgaa 


cgacacgcat 


720 


ttcagactgc 


cggaggcgaa 


gcgggcgcgg 


ctggcgaaga 


tctactcgcg 


gcgcgagggg 


780 


aagctgacgg 


cgcagcgcgg 


cctgcagacg 


ggaggcgttc 


cgtacggcgg 


catggggctg 


840 


tactcgacga 


tcggcgacta 


tgcgcggttc 


gcgcagatgc 


tgttgaacgg 


cggccatctc 


900 


gacggagtgc 


gcctgctggg 


gcggaagacg gtggatctga 


tgatgatgaa 


ccatctgggc 


960 


ggactgtcga 


agccgacgat 


cggcggcgat gattcagcgg gattcggact 


gggcggagcg 


1020 


gtgcggatcg 


atccggcgaa 


atcgggccgt 


ccgggcacgg aaggactctt 


cggctgggac 


1080 


ggggcggctt 


cgacgtattt 


ccgggtggac 


cggaaagaga 


agctggcgat 


gctgctgttc 


1140 


ctgcaatgga 


tgccgtttga 


tcaggggacg 


ctgaacctgt 


acgagacgct 


ggtctaccaa 


1200 


gctctggtgg 


actga 










1215 



<210> 18 
<211> 404 
<212> PRT 
<213> Unknovm 

<220> 

<223> Obtained from environmental sample 
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<220> 

<221> DOMAIN 
<222> (37) . . . (400) 
<223> Beta-lactaruase 

<220> 

<221> SITE 

<222> (43) . . . (46) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (167) . . . (170) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (240) . . . (243) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 18 

Met Arg Tyr Val Leu He Ser Cys Leu Ala Leu Ala Ser Leu Cys Ala 
15 10 15 



Gin Pro Leu Pro Val Ser Thr Pro Glu Lys Glu Gly Phe Ser Ala Glu 
20 25 30 



Arg Leu Gly Arg Met His Arg Tyr Phe Glu Asn Leu Thr Lys Thr Gly 
35 40 45 



Glu Arg Pro Gly Ala He Thr Leu He Val Arg Asn Gly Arg He Val 
50 55 60 



Asp Trp Arg Thr Phe Gly Leu Arg Asp Val Glu Asn Asn Leu Pro Met 
65 70 75 80 



Glu Lys Asp Thr He Val His He Tyr Ser Met Thr Lys Pro Val Thr 
85 90 95 



Ser Val Ala Val Met Met Leu Val Glu Glu Gly Arg Leu Ala Leu Asp 
100 105 110 



Asp Arg Val Asp Lys Phe He Pro Glu Phe Lys Gly Met Lys Val Tyr 
115 120 125 



Lys Gly Gly Thr Val Glu Arg Pro Glu Leu Glu Asp Ala Ala Arg Pro 
130 135 140 



He Thr Val Lys His Leu Leu Thr His Thr Ser Gly Leu Ser Tyr Gly 
145 150 155 160 
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Trp Gly Asn Asp Asn Val Ser Ala Met Tyr Arg Lys Ala Asp Pro Leu 
165 170 175 



Gly Ala Pro Ser Leu Lys Glu Phe lie Asp Arg Leu Val Lys Leu Pro 
180 185 190 



Leu Ala Phe His Pro Gly Glu Arg Tyr Glu Tyr Ser Met Ser He Asp 
195 200 205 



Val Leu Gly Tyr Leu Val Glu Ala Val Ser Gly Glu Pro Phe Asp Gin 
210 215 220 



Phe Val Glu Lys Arg He Thr Gly Pro Leu Lys Met Asn Asp Thr His 
225 230 235 240 



Phe Arg Leu Pro Glu Ala Lys Arg Ala Arg Leu Ala Lys He Tyr Ser 
245 250 255 



Arg Arg Glu Gly Lys Leu Thr Ala Gin Arg Gly Leu Gin Thr Gly Gly 
260 265 270 



Val Pro Tyr Gly Gly Met Gly Leu Tyr Ser Thr He Gly Asp Tyr Ala 
275 280 285 



Arg Phe Ala Gin Met Leu Leu Asn Gly Gly His Leu Asp Gly Val Arg 
290 295 300 



Leu Leu Gly Arg Lys Thr Val Asp Leu Met Met Met Asn His Leu Gly 
305 310 ' 315 320 



Gly Leu Ser Lys Pro Thr He Gly Gly Asp Asp Ser Ala Gly Phe Gly 
325 330 335 



Leu Gly Gly Ala Val Arg He Asp Pro Ala Lys' Ser Gly Arg Pro Gly 
340 345 350 



Thr Glu Gly Leu Phe Gly Trp Asp Gly Ala Ala Ser Thr Tyr Phe Arg 
355 360 365 



Val Asp Arg Lys Glu Lys Leu Ala Met Leu Leu Phe Leu Gin Tirp Met 
370 375 380 



Pro Phe Asp Gin Gly Thr Leu Asn Leu Tyr Glu Thr Leu Val Tyr Gin 
385 390 395 400 



Ala Leu Val Asp 
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<210> 19 
<211> 1794 
<212> DMA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 19 



atgcccgttt 


tgttcgccct 


gtttcttgtt 


gcctcgtcct 


gcgcggcgca 


gtcgctggcc 


60 


gggccggttt 


ccctgcttgg 


cggagatgcg 


ggcgcggcgt 


tccgctatac 


cgggccatcg 


120 


gcgggcgcgg 


cgagcggctc 


ggccgaatgg 


gtggcggtgg 


agaacatgcc 


gttcacgcac 


180 


gcctggcggc 


tgcgcacgaa 


tccgctgccg 


gagagcggcg 


gcaacgaatg 


ggacctgcgc 


240 


atccgcgccc 


gcggagcggc 


ggctgtttcg 


gcaggggaca 


agatcctggc 


cgagttctgg 


300 


atgcgctgcg 


tggagcccga 


aaacggcgac 


tgcattctgc 


gcctgaacgt 


ggagcgcgac 


360 


gggtcgccgt 


ggaccaaatc 


catcagcaac 


cGctacccgg 


tgggccggga 


gtggcggcgg 


420 


ttccgcgtgc 


tgttcgagat 


gcgggagagc 


tacgccgccg 


gcggctacat 


gatcgatttc 


480 


cggat-gggcc 


agcaggtgca 


gacggcggaa 


gtgggcggga 


tttccctgct 


gaattacggt 


540 


ccgcaggcca 


cggccgagca 


gcttggcctg 


gaccggtttt 


atgagggcgc 


ggcggcggac 


600 


yccgcy cggc 


ggc agg eg gc 


cgagcagcgg 


atcgaggaga 


tccggaaagc 


gggcatgatc 


660 


dccgcy y cy y 


tgacgccgga 


cggcgagccg 


atcgagggcg 


ctgaaatccg 


ggcgaagctg 


720 


dciycggcacy 


cgttcgggtg 


gggcacggct 


gtggcggcat 


cacggcttct 


ggggacggga 


780 


w y y « c a. y c y 


a.y cy ccaccg 


caact tcatc 


cgcgagaact 


tcaacatggc 


ggtgctcgag 


840 


aacgacctga 


aatggggccc 


gttcgaagag 


aaccgcaacc 


gcgcgatgaa 


cgcgctgcgc 


900 


tggctgcatg 


agaacgggat 


cacgtggatc 


cgcgggcaca 


atctcgtctg 


gccgggctgg 


960 


cggtggatgc 


cgaacgacgt 


gcgcaacctg 


gcgaacaatc 


ccgaggcgct 


gcggcagcgg 


1020 


attctggacc 


gcatccggga 


cacggccacg 


gccacgcgcg 


ggctggtggt 


gcactgggac 


1080 


gtcgtcaacg 


agccggtggc 


cgagcgcgac 


gtgctgaaca 


ttctgggcga 


cgaggtgatg 


1140 


gcggactggt 


tccgcgccgc 


gaaggagtgc 


gatcccgagg 


cgaggatgtt 


catcaatgag 


1200 


tacgacattc 


tggcggcgaa 


cggggccaat 


ctgcggaagc 


agaacgcgta 


ttaccgcatg 


1260 


atcgagatgc 


tgttgaagct 


cgaggcgccg 


gtggagggca 


tcggcttcca 


gggccacttc 


1320 


gacacggcca 


cgccgccgga 


gcggatgctg 


gagatcatga accggtacgc 


ccggctcggg 


1380 


ctgccgatcg 


ccatcaccga 


gtacgatttc 


gccacggcgg 


acgaggagct 


gcaggcgcag 


1440 


ttcacgcgcg 


acctgatgat 


tctcgccttc 


agccatccgg 


cggtttcgga 


cttcctgatg 


1500 
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tggggcttct 


gggaagggag 


ccactggaag 


ccgctgggcg 


ccatgatccg 


gcgcgactgg 


1560 


agcgagaagc 


cgatgtaccg 


cgtctggcgc 


gagctgatct 


tcgagcgctg gcagacggat 


1620 


gaaacaggcg 


tgacgccgga 


gcacggtgcc 


atctacgtgc 


ggggcttcaa 


gggcgactac 


1680 


gagatcacgg 


tgaaggcggg 


cgggcaggaa gtccgggtgc 


cgtacacgct 


gaaagaagac 


1740 


ggccaggtgc 


tgtgggtgac 


ggtgggcggg 


gcttctgaag 


agcgcgtgca 


gtaa 


1794 



<210> 20 
<211> 597 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAL 
<222> (1) . . . (20) 

<220> 

<221> DOMAIN 

<222> (235) . . . (533) 

<223> Glycosyl hydrolase family 10 

<220> 

<221> SITE 

<222> (467) . . . (477) 

<223> Glycosyl hydrolases family 10 active site. Prosite id PS00591 
<400> 20 

Met Pro Val Leu Phe Ala Leu Phe Leu Val Ala Ser Ser Cys Ala Ala 
15 10 15 

Gin Ser Leu Ala Gly Pro Val Ser Leu Leu Gly Gly Asp Ala Gly Ala 
20 25 30 



Ala Phe Arg Tyr Thr Gly Pro Ser Ala Gly Ala Ala Ser C?ly Ser Ala 
35 40 45 



Glu Trp Val Ala Val Glu Asn Met Pro Phe Thr His Ala Trp Arg Leu 
50 55 60 



Arg Thr Asn Pro Leu Pro Glu Ser Gly Gly Asn Glu Trp Asp Leu Arg 
65 70 75 80 



lie Arg Ala Arg Gly Ala Ala Ala Val Ser Ala Gly Asp Lys lie Leu 
85 90 95 



Ala Glu Phe Trp Met Arg Cys Val Glu Pro Glu Asn Gly Asp Cys lie 
100 105 110 
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Leu Arg Leu Asn Val Glu Arg Asp Gly Ser Pro Trp Thr Lys Ser lie 
115 120 125 



Ser Asn Pro Tyr Pro Val Gly Arg Glu Trp Arg Arg Phe Arg Val Leu 
130 135 140 



Phe Glu Met Arg Glu Ser Tyr Ala Ala Gly Gly Tyr Met He Asp Phe 
145 150 155 160 



Trp Met Gly Gin Gin Val Gin Thr Ala Glu Val Gly Gly He Ser Leu 
165 170 175 . 



Leu Asn Tyr Gly Pro Gin Ala Thr Ala Glu Gin Leu Gly Leu Asp Arg 
180 185 190 



Phe Tyr Glu Gly Ala Ala Ala Asp Ala Ala Trp Arg Gin Ala Ala Glu 
195 200 205 



Gin Arg He Glu Glu He Arg Lys Ala Gly Met He He Val Ala Val 
210 215 220 



Thr Pro Asp Gly Glu Pro He Glu Gly Ala Glu He Arg Ala Lys Leu 
225 230 235 240 



Lys Arg His Ala Phe Gly Trp Gly Thr Ala Val Ala Ala Ser Arg Leu 
245 250 255 



Leu Gly Thr Gly Thr Asp Ser Glu Arg Tyr Arg Asn Phe He. Arg Glu 
260 265 270 



Asn Phe Asn Met Ala Val Leu Glu Asn Asp Leu Lys Trp Gly Pro Phe 
275 280 285 



Glu Glu Asn Arg Asn Arg Ala Met Asn Ala Leu Arg Trp Leu His Glu 
290 295 300 



Asn Gly He Thr Trp He Arg Gly His Asn Leu Val Trp Pro Gly Trp 
305 310 315 320 



Arg Trp Met Pro Asn Asp Val Arg Asn Leu Ala Asn Asn Pro Glu Ala 
325 330 335 



Leu Arg Gin Arg He Leu Asp Arg He Arg Asp Thr Ala Thr Ala Thr 
340 345 350 
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Arg Gly Leu Val Val His Trp Asp Val Val Asn Glu Pro Val Ala Glu 
355 360 365 



Arg Asp Val Leu Asn He Leu Gly Asp Glu Val Met Ala Asp Trp Phe 
370 375 380 

Arg Ala Ala Lys Glu Cys Asp Pro Glu Ala Arg Met Phe He Asn Glu 
385 390 395 400 

Tyr Asp He Leu Ala Ala Asn Gly Ala Asn Leu Arg Lys Gin Asn Ala 
405 410 415 



Tyr Tyr Arg Met He Glu Met Leu Leu Lys Leu Glu Ala Pro Val Glu 
420 425 430 



Gly He Gly Phe Gin Gly His Phe Asp Thr Ala Thr Pro Pro Glu Arg 
435 440 445 



Met Leu Glu He Met Asn Arg Tyr Ala Arg Leu Gly Leu Pro He Ala 
450 455 460 



He Thr Glu Tyr Asp Phe Ala Thr Ala Asp Glu Glu Leu Gin Ala Gin 
465 470 475 480 



Phe Thr Arg Asp Leu Met He Leu Ala Phe Ser His Pro Ala Val Ser 
485 490 495 



Asp Phe Leu Met Trp Gly Phe Trp Glu Gly Ser His Trp Lys Pro Leu 
500 505 510 



Gly Ala Met He Arg Arg Asp Trp Ser Glu Lys Pro Met Tyr Arg Val 
515 520 525 



Trp Arg Glu Leu He Phe Glu Arg Trp Gin Thr Asp Glu Thr Gly Val 
530 535 540 



Thr Pro Glu His Gly Ala He Tyr Val Arg Gly Phe Lys Gly Asp Tyr 
545 550 555 560 



Glu He Thr Val Lys Ala Gly Gly Gin Glu Val Arg Val Pro Tyr Thr 
565 570 575 



Leu Lys Glu Asp Gly Gin Val Leu Trp Val Thr Val Gly Gly Ala Ser 
580 585 .590 



Glu Glu Arg Val Gin 
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595 



<210> 21 
<211> 1032 
<212> DWA 

<213> Clostridium thermocellum 
<400> 21 



atggtgagtt 


ttaaagcagg 


tataaattta 


ggcggatgga 


tatcacaata tcaagttttc 


60 


agcaaagagc 


atttcgatac 


attcattacg 


gagaaggaca 


ttgaaactat tgcagaagca 


120 


gggtttgacc 


atgtcagact 


gccttttgat 


tatccaatta 


tcgagtctga tgacaatgtg 


180 


ggagaatata 


aagaagatgg 


gctttcttat 


attgaccggt 


gccttgagtg gtgtaaaaaa 


240 


tacaatttgg 


ggcttgtgtt 


ggatatgcat 


cacgctcccg ggtaccgctt tcaagatttt 


300 


aagacaagca 


ccttgtttga 


agatccgaac 


cagcaaaaga 


gatttgttga catatggaga 


360 


tttttagcca 


agcgttacat 


aaatgaacgg gaacatattg cctttgaact gttaaatgaa 


420 


gttgttgagc 


ctgacagtac 


ccgctggaac 


aagttgatgc 


ttgagtgtgt aaaagcaatc 


480 


^gggaaattg 


attccaccag 


gtggctttac 


attgggggca 


ataactataa cagtcctgat 


540 


gagcttaaaa 


accttgcaga 


tattgatgat 


gattacatag 


tttacaattt ccatttttac 


600 


aatccttttt 


tctttacgca 


tcagaaagcc 


cactggtcgg 


aaagtgccat ggcgtacaac 


660 


aggactgtaa 


aatatccggg 


acaatatgag 


ggaattgaag 


agtttgtgaa aaataatcct 


720 


aagtacagtt 


ttatgatgga 


attgaataac 


ctgaagctga ataaagagct tttgcgcaaa 


780 


gatttaaaac 


cagcaattga 


gttcagggaa 


aagaaaaaat 


gcaaactata ttgcggggag 


840 


tttggcgtaa 


ttgccattgc 


tgacctggag tccaggataa 


aatggcatga agattatata 


900 


agtcttctag 


Q-ggagtatga 


tatcggcggc gcggtgtgga actacaaaaa aatggatttt 


960 


gaaatttata 


atgaggatag 


aaaacctgtc 


tcgcaagaat 


tggtaaatat actggcgaga 


1020 


agaaaaactt 


ga 








1032 



<210> 22 
<211> 343 
<212> PRT 

<213> Clostridium thermocellum 



<220> 

<221> DOMAIN 
<222> (1) . . . (323) 

<223> Cellulase (glycosyl hydrolase family 5) 
<220> 

<221> SITE 

<222> (15) . . . (32) 

<223> Cytosolic fatty-acid binding proteins signature. Prosite id = PS00214 
<220> 
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<221> SITE 

<222> (135) ... (144) 

<223> Glycosyl hydrolases family 5 signature. Prosite id = PS00659 
<220> 

<221> SITE 

<222> (223) . . . (226) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 22 

Met Val Ser Phe hys Ala Gly He Asn Leu Gly Gly Trp He Ser Gin 
1 5 10 • 15 



Tyr Gin Val Phe Ser Lys Glu His Phe Asp Thr Phe He Thr Glu Lys 
20 25 30 



Asp He Glu Thr He Ala Glu Ala Gly Phe Asp His Val Arg Leu Pro 
35 40 45 



Phe Asp Tyr Pro He He Glu Ser Asp Asp Asn Val Gly Glu Tyr Lys 
50 55 60- 



Glu Asp Gly Leu Ser Tyr He Asp Arg Cys Leu Glu Trp Cys Lys Lys 
65 70 75 80 



Tyr Asn Leu Gly Leu Val Leu Asp Met His His Ala Pro Gly Tyr Arg 
85 90 95 



Phe Gin Asp Phe Lys Thr Ser Thr Leu Phe Glu Asp Pro Asn Gin Gin 
100 105 110 



Lys Arg Phe Val Asp He Trp Arg Phe Leu Ala Lys Arg Tyr He Asn 
115 120 125 



Glu Arg Glu His He Ala Phe Glu Leu Leu Asn Glu Val Val Glu Pro 
130 135 140 



Asp Ser Thr Arg Trp Asn Lys Leu Met Leu Glu Cys Val Lys Ala He 
145 150 155 160 



Arg Glu He Asp Ser Thr Arg Trp Leu Tyr He Gly Gly Asn Asn Tyr 
165 170 175 



Asn Ser Pro Asp Glu Leu Lys Asn Leu Ala Asp He Asp Asp Asp Tyr 
180 185 190 



He Val Tyr Asn Phe His Phe Tyr Asn Pro Phe Phe Phe Thr His Gin 
195 200 205 
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Lys Ala His Trp Ser Glu Ser Ala Met Ala Tyr Asn Arg Thr Val Lys 
210 215 220 



Tyr Pro Gly Gin Tyr Glu Gly lie Glu Glu Phe Val Lys Asn Asn Pro 
225 230 235 240 



Lys Tyr Ser Phe Met Met Glu Leu Asn Asn Leu Lys Leu Asn Lys Glu 
245 250 255 



Leu Leu Arg Lys Asp Leu Lys Pro Ala lie Glu Phe Arg Glu Lys Lys 
260 265 270 



Lys Cys Lys Leu Tyr Cys Gly Glu Phe Gly Val lie Ala lie Ala Asp 
275 280 285 



Leu Glu Ser Arg He Lys Trp His Glu Asp Tyr He Ser Leu Leu Glu 
290 295 300 



Glu Tyr Asp He Gly Gly Ala Val Trp Asn Tyr Lys Lys Met Asp Phe 
305 310 315 320 



Glu He Tyr Asn Glu Asp Arg Lys Pro Val Ser Gin Glu Leu Val Asn 
325 330 335 



He Leu Ala Arg Arg Lys Thr 
340 



<210> 23 
<211> 3966 
<212> DNA 

<213> Clostridiinn thermocellum 
<400> 23 

atgtataaaa gattattgtc gtcagtactg ataattatgc tgttattatc agcctggtcg 60 
ccaatatccg tacaagcttc tgatggaatc aatgacatta gaggtcattg ggctgaagaa 120 
gacttgaaca aatggatgga aaaaggtatt ttggtgggct accaggatgg gacgataagg 180 
cccgataata atatcacaag agccgaattt gtcacattaa ttaacaaggt tttcgggctt 240 
tatgaattaa gccgggagca attcgcagat gttgaagact caaaatggta ttcccgtgaa 300 
atattaaaag ccagggctgc gggatatatt gcaggttatg gaagcaatgt tttcaaacct 360 
gacaattata ttacaagaca agaagccgtt gttataatcg cgaaagtttt tgaacttcaa 420 
agcggcagca attatacaag caagtttaaa gatggaagtc tggtaaagga atacgcaaaa 480 
gattccgtta gcgcgttggt tgaaaaaggc tacatagcag gttatgaaga tggcactttc 540 
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aggccggaca 


actacattac 


ccgtgcagaa 


acaataaaaa 


ttctgaataa 


aattattcct 


600 


tccttgtata 


acgagaaagg 


agattataaa 


aatgaagaag 


tagccggaaa 


cgctctgatt 


660 


aacaccgaag 


gagttatttt 


aaaagatacc 


gtaataaacg 


gggatttgta 


tcttgctcag 


720 


ggaattcaga 


acggcgatgt 


tacccttgac 


ggtgtgaatg 


taaaaggaac 


ggttttcgta 


780 


aatggtggag 


gaagcgacag 


catacatttt 


ataaatacga 


aaataaacag 


ggttgttgtc 


840 


aataaaacag 


gagttagaat 


tgtaacttcc 


ggcaatacct 


cggttgaaag 


tgttgtcgtt 


900 


aaatccggtg 


caaaacttga 


agaaaaagaa 


ttgacgggcg 


acggctttaa 


aaacgttaca 


960 


gtcgattctc 


aactttcagc 


cggcaatgaa 


ataatatttg 


tcggggattt 


tgaacaggtc 


1020 


gatgttctgg 


cggatgatgc 


cttgctggaa 


accaaagagg 


caaaaatgaa 


actgagaata 


1080 


ttcggccaaa 


ggattaaagt 


aaatggaaag 


gcaatagaaa 


aatcatcaaa 


gaactatatt 


1140 


gtaaacgggg 


aacttatatc 


aactgaggaa 


gaacccggtc 


cttccgacgc 


acccggtgcg 


1200 


gaagacgatc 


aaaattcagg 


tagtccgggc 


tcatcgacta 


atcctgcacc 


aaccaaga'at 


1260 


ccgaatgaag 


agtggcgtct 


ggtttggagc 


gatgagttta 


acggttctga 


aataaatatg 


1320 


gctaattgga 


gctatgacga 


cccgaccaac 


ggaagatgga 


acggggaagt 


acaatcctac 


1380 


acacaaaaca 


atgcctatat 


caaagacggc 


gcgttggtta 


ttgaagcaag 


aaaagaagac 


1440 


attacggaac 


caagcggtga 


gacttatcat 


tatacatcgt 


caaagctgat 


taccaaaggc 


1500 


aaaaagtcat 


ggaagtacgg 


aaaatttgaa 


ataagggcaa 


aaatgccaca 


gggacaaggt 


1560 


atatggcctg 


caatctggat 


gatgccggaa 


gacgaaccct 


tctacggaac 


atggccaaag 


1620 


tgcggcgaaa 


tagatattat 


ggagcttttg 


ggccacgagc 


ctgataaaat 


ttatggaacg 


1680 


atccattttg 


gagagcctca 


taaagaatcc 


cagggaacgt 


ataccttgcc 


ggaaggccag 


1740 


acttttgctg 


atgatttcca 


cgtttattcg 


attgaatggg 


aaccgggaga 


aatacgctgg 


1800 


tatatagacg 


gcaagctgta 


tcatgtcgct 


aatgactggt 


actcgaggga 


cccgtacctt 


1860 


gccgatgact 


acacttatcc 


cgcacctttt 


gaccagaatt 


tcttcttgat 


tctcaatata 


1920 


tccgttggtg 


gcggctggcc 


gggatatcct 


gacgaaacga 


cagttttccc 


gcagcaaatg 


1980 


gttgtggact 


atgtgagagt 


atatcaaaaa 


gataaatatc 


ctcacaggga 


aaaaccggca 


2040 


aaggaagaag 


tgaagccaag 


agagcctctt 


gaggacggca 


attatatcta 


taacggcggt 


2100 


tttgatgtgg 


atgattctgc 


agcagttggt 


gtggacggtg 


ttccctatac 


gtcttactgg 


2160 


acattcttaa 


cagcatccgg 


tggagctgcg 


acagtcaatg 


tagaggaagg 


tgttatgcac 


2220 


gtacagatag 


aaaacggagg 


gacaaccgac 


tacggcgtac 


aattgcttca 


agctccgatt 


2280 


catcttgaaa 


aaggcgcaaa 


atataaagca 


tcttttgaca 


tgaaagctga 


aaatccaagg 


2340 


caggtaaaac 


tgaaaatagg 


cggagacggc 


gacaggggat 


ggaaagatta 


tgcggctatt 


2400 
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ccaccgttta 


cggtctcaac agagatgacc aactatgagt ttgagtttac 


tatgaaagat 


2460 


gataccgatg 


ttaaggcacg gtttgagttt aatatgggtt tggacgataa 


tgatgtctgg 


2520 


attgacaatg 


ttaaactgat taaaacagaa gatgcgccgg ttatagatcc 


ttccgaaata 


2580 


gcaagacctc 


cgcttctttc cggcaactat atatacaacg gtacetttga 


ccaaggtccg 


2640 


aacagaatgg 


gattctggaa ttttgttgtg gatagcactg caaaggctac 


atactatatt 


2700 


ggaagcgatg 


ttaatgagcg caggtttgaa acaagaatag aaaaaggcgg 


aacatcgagg 


2760 


ggagccataa 


gattggttca gccgggaatt aacattgaaa acggcaaaac 


atacaaggtt 


2820 


agcttcgaag ccagtgcggc aaatacaaga actattgagg tggaaattgc 


aagcaatctt 


2880 


cacaacagca gcatttttgc gacaactttt gaaataagca aagagagcaa 


gatatacgaa 


2940 


tttgagttta 


caatggacaa agattcggac aagaacggag aacttaggtt 


caatctgggc 


3000 


ggaagcaacg 


tgaacgtcta tattgataat gtcgttatga aaagagtaag 


taccgatgaa 


3060 


gttgaaggaa 


acctgatttt aaacggcgta tttaacggcc tggcaggctg 


gggatatgga 


3120 


gcgtatgaac 


ctggatcggc agattttgaa agtcatgagg aacaatttag 


ggcaattatt 


3180 


agctctgtcg gtaatgaagg ttggaatgta cagttgtatc aggataatgt 


tccgctggaa 


3240 


caagggcaaa 


cctacgaagt ttcttttgat gcaaaatcaa cgattgacag 


aaagataatt 


3300 


gttcagctgc 


aaaggaacgg tacttcggat aataattggg actcctattt 


ctatcaagaa 


3360 


gttgaactta 


ctaatgaact taaaacattc aaatatgaat ttacaatgag 


taaacctaca 


3420 


gattcggcgt 


caagatttaa ttttgctttg ggtaatactg aaaacaaaac 


ttatgctcct 


3480 


catgaaataa 


taattgacaa tgttgtagta agaaaagttg cgactccttc 


tgcgctgata 


3540 


ttgaacggaa 


cctttgacga tggaatggat cattggctgc tatactgggg 


agacggtgaa 


3600 


ggcaattgcg atgtaactga cggagagctt gaaattaaca ttaccaaggt 


aggtaccgcg 


3660 


gattacatgc 


cgcagattaa acaggaaaac atagcgttgc aagagggtgt 


gacgtatact 


3720 


ttgtctctta 


aagcgagagc gcttgaggca agaagtatta aagtggacat 


attggattct 


3780 


tcttataact 


ggtatggcgg aactattttc gatttaacaa cggaagatgc 


cgtatacacg 


3840 


tttacattta 


cccaaagcaa gtcgataaat aacggtgtct taactataaa 


tttaggtacc 


3900 


atagaaggta 


agacatccgc cgcaactact gtctatcttg atgatatttt 


gctggaacaa 


3960 


cagtaa 






3966 



<210> 24 
<211> 1321 
<212> PRT 

<213> Clostridium thermocellum 
<220> 
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<221> SIGNAL 
<222> (1) . . . (26) 

<220> 

<221> DOMAIN 

<222> (30) . . . (71) 

<223> S-layer homology domain 

<220> 

<221> DOMAIN 

<222> (88) . . . (130) 

<223> S-layer homology domain 

<220> 

<221> DOMAIN 

<222> (149) . . . (192) 

<223> S-layer homology domain 

<220> 

<221> DOMAIN 

<222> (445) . . . (666) 

<223> Glycosyl hydrolases family 16 
<220> 

<221> DOMAIN 

<222> (693) , . . (849) 

<223> Carbohydrate binding domain 

<220> 

<221> DOMAIN 

<222> (868) ... (1016) 

<223> Carbohydrate binding domain 

<220> 

<221> DOMAIN 

<222> (1023) . . . (1173) 

<223> Carbohydrate binding domain 

<220> 

<221> DOMAIN 

<222> (1177) . . . (1321) 

<223> Carbohydrate binding domain 

<220> 

<221> SITE 

<222> (146) . . . (149) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (285) . . . (288) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (296) . . . (299) 

<223> W-glycosylation site. Prosite id == PSOOOOl 
<220> 

<221> SITE 

<222> (322) . . . (325) 
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<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (440) . . , (443) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (448) . . . (451) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (648) . . . (651) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (886) . . . (889) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (976) . . . (979) 

<223> N-glycosylation site. Prosite id = PSOOOOl 

<220> 
<221> SITE 

<222> (1123) . . . (1126) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (1172) ... (1175) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (1200) . , . (1203) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (1231) . . . (1234) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 24 

Met Tyr Lys Arg beu Leu Ser Ser Val Leu He He Met Leu Leu Leu 
1 5 10 15 

Ser Ala Trp Ser Pro He Ser Val Gin Ala Ser Asp Gly He Asn Asp 
20 25 30 



He Arg Gly His Trp Ala Glu Glu Asp Leu Asn Lys Txp Met Glu Lys 
35 40 45 



Gly He Leu 



Val Gly Tyr Gin Asp 



Gly Thr He Arg Pro Asp Asn Asn 
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50 55 60 



He Thr Arg Ala Glu Phe Val Thr Leu He Asn Lys Val Phe Gly Leu 
65 70 75 80 



Tyr Glu Leu Ser Arg Glu Gin Phe Ala Asp Val Glu Asp Ser Lys Trp 
85 90 95 



Tyr Ser Arg Glu He Leu Lys Ala Arg Ala Ala Gly Tyr He Ala Gly 
100 105 110 



Tyr Gly Ser Asn Val Phe Lys Pro Asp Asn Tyr He Thr Arg Gin Glu 
115 120 125 



Ala Val Val He He Ala Lys Val Phe Glu Leu Gin Ser Gly Ser Asn 
130 135 140 



Tyr Thr Ser Lys Phe Lys Asp Gly Ser Leu Val Lys Glu Tyr Ala Lys 
145 150 155 160 



Asp Ser Val Ser Ala Leu Val Glu Lys Gly Tyr He Ala Gly Tyr Glu 
165 170 175 



Asp Gly Thr Phe Arg Pro Asp Asn Tyr He Thr Arg Ala Glu Thr He 
180 185 190 



Lys He Leu Asn Lys He He Pro Ser Leu Tyr Asn Glu Lys Gly Asp 
195 200 205 



Tyr Lys Asn Glu Glu Val Ala Gly Asn Ala Leu He Asn Thr Glu Gly 
210 215 220 



Val He Leu Lys Asp Thr Val He Asn Gly Asp Leu Tyr Leu Ala Gin 
225 230 235 240 



Gly He Gin Asn Gly Asp Val Thr Leu Asp Gly Val Asn Val Lys Gly 
245 250 255 



Thr Val Phe Val Asn Gly Gly Gly Ser Asp Ser He His Phe He Asn 
260 265 270 



Thr Lys He Asn Arg Val Val Val Asn Lys Thr Gly Val Arg He Val 
275 280 285 



Thr Ser Gly Asn Thr Ser Val Glu Ser Val Val Val Lys Ser Gly Ala 
290 295 300 



wo 2006/101584 



42/248 



PCT/US2006/002516 



Lys Leu Glu Glu Lys Glu Leu Thr Gly Asp Gly Phe Lys Asn Val Thr 
305 310 315 320 



Val Asp Ser Gin Leu Ser Ala Gly Asn Glu He He Phe Val Gly Asp 
325 330 335 



Phe Glu Gin Val Asp Val Leu Ala Asp Asp Ala Leu Leu Glu Thr Lys 
340 345 350 



Glu Ala Lys Met Lys Leu Arg He Phe Gly Gin Arg He Lys Val Asn 
355 360 365 



Gly Lys Ala He Glu Lys Ser Ser Lys Asn Tyr He Val Asn Gly Glu 
370 375 380 



Leu He Ser Thr Glu Glu Glu Pro Gly Pro Ser Asp Ala Pro Gly Ala 
385 390 395 400 



Glu Asp Asp Gin Asn Ser Gly Ser Pro Gly Ser Ser Thr Asn Pro Ala 
405 410 415 



Pro Thr Lys Asn Pro Asn Glu Glu Trp Arg Leu Val Trp Ser Asp Glu 
420 425 430 



Phe Asn Gly Ser Glu He Asn Met Ala Asn Trp Ser Tyr Asp Asp Pro 
435 440 445 



Thr Asn Gly Arg Trp Asn Gly Glu Val Gin Ser Tyr Thr Gin Asn Asn 
450 455 460 



Ala Tyr He Lys Asp Gly Ala Leu Val He Glu Ala Arg Lys Glu Asp 
465 470 475 480 



He Thr Glu Pro Ser Gly Glu Thr Tyr His Tyr Thr Ser Ser Lys Leu 
485 490 495 



He Thr Lys Gly Lys Lys Ser Trp Lys Tyr Gly Lys Phe Glu He Arg 
500 505 510 



Ala Lys Met Pro Gin Gly Gin Gly He Trp Pro Ala He Trp Met Met 
515 520 525 



Pro Glu Asp Glu Pro Phe Tyr Gly Thr Trp Pro Lys Cys Gly Glu He 
530 535 540 
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Asp lie Met Glu Leu Leu Gly His Glu Pro Asp Lys He Tyr Gly Thr 
545 550 555 560 



He His Phe Gly Glu Pro His Lys Glu Ser Gin Gly Thr Tyr Thr Leu 
565 570 575 



Pro Glu Gly Gin Thr Phe Ala Asp Asp Phe His Val Tyr Ser He Glu 
580 585 590 



Trp Glu Pro Gly Glu He Arg Trp Tyr He Asp Gly Lys Leu Tyr His 
595 600 605 



Val Ala Asn Asp Trp Tyr Ser Arg Asp Pro Tyr Leu Ala Asp Asp Tyr 
610 615 620 



Thr Tyr Pro Ala Pro Phe Asp Gin Asn Phe Phe Leu He Leu Asn He 
625 630 635 640 



Ser Val Gly Gly Gly Trp Pro Gly Tyr Pro Asp Glu Thr Thr Val Phe 
645 650 655 



Pro Gin Gin Met Val Val Asp Tyr Val Arg Val Tyr Gin Lys Asp Lys 
660 665 670 



Tyr Pro His Arg Glu Lys Pro Ala Lys Glu Glu Val Lys Pro Arg Glu 
675 680 685 



Pro Leu Glu Asp Gly Asn Tyr He Tyr Asn Gly Gly Phe Asp Val Asp 
690 695 700 



Asp Ser Ala Ala Val Gly Val Asp Gly Val Pro Tyr Thr Ser Tyr Trp 
705 710 715 720 



Thr Phe Leu Thr Ala Ser Gly Gly Ala Ala Thr Val Asn Val Glu Glu 
725 730 735 



Gly Val Met His Val Gin He Glu Asn Gly Gly Thr Thr Asp Tyr Gly 
740 745 750 



Val Gin Leu Leu Gin Ala Pro He His Leu Glu Lys Gly Ala Lys Tyr 
755 760 765 



Lys Ala Ser Phe Asp Met Lys Ala Glu Asn Pro Arg Gin Val Lys Leu 
770 775 780 
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Lys He Gly Gly Asp Gly Asp Arg Gly Trp Lys Asp Tyr Ala Ala He 
785 790 795 800 



Pro Pro Phe Thr Val Ser Thr Glu Met Thr Asn Tyr Glu Phe Glu Phe 
805 810 815 



Thr Met Lys Asp Asp Thr Asp Val Lys Ala Arg Phe Glu Phe Asn Met 
820 825 830 



Gly Leu Asp Asp Asn Asp Val Trp He Asp Asn Val Jbys Leu He Lys 
835 840 845 



Thr Glu Asp Ala Pro Val He Asp Pro Ser Glu He Ala Arg Pro Pro 
850 855 860 



Leu Leu Ser Gly Asn Tyr He Tyr Asn Gly Thr Phe Asp Gin Gly Pro 
865 870 875 880 



Asn Arg Met Gly Phe Trp Asn Phe Val Val Asp Ser Thr Ala Lys Ala 
885 890 895 



Thr Tyr Tyr He Gly Ser Asp Val Asn Glu Arg Arg Phe Glu Thr Arg 
900 905 910 



He Glu Lys Gly Gly Thr Ser Arg Gly Ala He Arg Leu Val Gin Pro 
915 920 925 



Gly He Asn He Glu Asn Gly Lys Thr Tyr Lys Val Ser Phe Glu Ala 
930 935 940 



Ser Ala Ala Asn Thr Arg Thr He Glu Val Glu He Ala Ser Asn Leu 
945 950 955 960 



His Asn Ser Ser He Phe Ala Thr Thr Phe Glu He Ser Lys Glu Ser 
965 970 975 



Lys He Tyr Glu Phe Glu Phe Thr Met Asp Lys Asp Ser Asp Lys Asn 
980 985 990 



Gly Glu Leu Arg Phe Asn Leu Gly Gly Ser Asn Val Asn Val Tyr He 
995 1000 1005 



Asp Asn Val Val Met Lys Arg Val Ser Thr Asp Glu Val Glu Gly 
1010 1015 1020 



Asn Leu He Leu Asn Gly Val Phe Asn Gly Leu Ala Gly Trp Gly 
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1025 1030 1035 



Tyr Gly Ala Tyr Glu Pro Gly Ser Ala Asp Phe Glu Ser His Glu 
1040 1045 1050 



Glu Gin Phe Arg Ala He He Ser Ser Val Gly Asn Glu Gly Trp 
1055 1060 1065 



Asn Val Gin Leu Tyr Gin Asp Asn Val Pro Leu Glu Gin Gly Gin 
1070 1075 1080 



Thr Tyr Glu Val Ser Phe Asp Ala Lys Ser Thr He Asp Arg Lys 
1085 1090 1095 



He He Val Gin Leu Gin Arg Asn Gly Thr Ser Asp Asn Asn Trp 
1100 1105 1110 



Asp Ser Tyr Phe Tyr Gin Glu Val Glu Leu Thr Asn Glu Leu Lys 
1115 1120 1125 



Thr Phe Lys Tyr Glu Phe Thr Met Ser Lys Pro Thr Asp Ser Ala 
1130 1135 1140 



Ser Arg Phe Asn Phe Ala Leu Gly Asn Thr Glu Asn Lys Thr Tyr 
1145 1150 1155 



Ala Pro His Glu He He He Asp Asn Val Val Val Arg Lys Val 
1160 1165 1170 



Ala Thr Pro Ser Ala Leu He Leu Asn Gly Thr Phe Asp Asp Gly 
1175 1180 1185 



Met Asp His Trp Leu Leu Tyr Trp Gly Asp Gly Glu Gly Asn Cys 
1190 1195 1200 



Asp Val Thr Asp Gly Glu Leu Glu He Asn He Thr Lys Val Gly 
1205 1210 1215 



Thr Ala Asp Tyr Met Pro Gin He Lys Gin Glu Asn He Ala Leu 
1220 1225 1230 



Gin Glu Gly Val Thr Tyr Thr Leu Ser Leu Lys Ala Arg Ala Leu 
1235 1240 1245 



Glu Ala Arg Ser He Lys Val Asp He Leu Asp Ser Ser Tyr Asn 
1250 1255 1260 
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Trp Tyr Gly Gly Thr lie Phe Asp Leu Thr Thr Glu Asp Ala Val 
1265 1270 1275 

Tyr Thr Phe Thr Phe Thr Gin Ser Lys Ser lie Asn Asn Gly Val 
1280 1285 1290 

Leu Thr lie Asn Leu Gly Thr lie Glu Gly Lys Thr Ser Ala Ala 
1295 1300 1305 

Thr Thr Val Tyr Leu Asp Asp lie Leu Leu Glu Gin Gin 
1310 1315 1320 

<210> 25 
<211> 1347 
<212> DNA 

<213> Clostridium thermocellum 
<400> 25 



atgtcaaaga 


taactttccc 


aaaagatttc 


atatggggtt 


ctgcaacagc 


agcatatcag 


60 


attgaaggtg 


catacaacga 


agacggcaaa 


ggtgaatcta 


tatgggaccg 


tttttcccac 


120 


acgccaggaa 


atatagcaga 


cggacatacc 


ggcgatgttg 


catgcgacca 


ctatcatcgt 


180 


tatgaagaag 


atatcaaaat 


aatgaaagaa 


atcggtatta 


aatcatacag 


gttttccatc 


240 


tcatggccca 


gaatctttcc 


tgaaggaaca 


ggtaaattaa 


atcaaaaggg 


actggatttt 


300 


tacaaaaggc 


tcacaaatct 


gcttctggaa 


aacggaatta 


tgcctgcaat 


cactctttat 


360 


cactgggacc 


ttccccaaaa 


gcttcaggat 


aaaggcggat 


ggaaaaaccg 


ggacaccacc 


420 


gattatttta 


cagaatactc 


tgaagtaata 


tttaaaaatc 


tcggagatat 


cgttccaata 


480 


tggtttactc 


acaatgaacc 


cggtgttgtt 


tctttgcttg 


gccacttttt 


aggaattcat 


540 


gcccctggga 


taaaagacct 


ccgcacttca 


ttggaagtct 


cgcacaatct 


tcttttgtcc 


600 


cacggcaagg 


ccgtgaaact 


gtttagagaa 


atgaatattg acgcccaaat 


tggaatagct 


660 


ctcaatttat 


cttaccatta 


tcccgcatcc 


gaaaaagctg aggatattga 


agcagcggaa 


720 


ttgtcatttt 


ctctggcggg 


aaggtggtat 


ctggatcctg 


tgctaaaagg 


ccggtatcct 


780 


gaaaacgcat 


tgaaacttta 


taaaaagaag 


ggtattgagc 


tttctttccc 


tgaagatgac 


840 


ctgaaactta 


tcagtcagcc 


aatagacttc 


atagcattca 


acaattattc 


ttcggaattt 


900 


ataaaatatg 


atccgtccag 


tgagtcaggt 


ttttcacctg 


caaactccat 


attagaaaag 


960 


ttcgaaaaaa 


cagatatggg 


ctggatcata 


tatcctgaag gcttgtatga 


tctgcttatg 


1020 


ctccttgaca 


gggattatgg 


aaagccaaac 


attgttatca 


gcgaaaacgg 


agccgccttc 


1080 


aaagatgaaa 


taggtagcaa 


cggaaagata 


gaagacacaa 


agagaatcca 


atatcttaaa 


1140 



I 
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gattatctga cccaggctca cagggcaatt caggacggtg taaacttaaa agcatactac 1200 

ttgtggtcgc ttttggacaa ctttgaatgg gcttacgggt acaacaagag attcggaatc 1260 

gttcacgta^ attttgatac gttggaaaga aaaataaagg atagcggcta ctggtacaaa 1320 

gaagtaatca aaaacaacgg tttttaa 1347 

<210> 26 

<211> 448 

<212> PRT 

<213> Clostridium therraocelliim 
<220> 

<221> DOMAIN 

<222> (2) . . . (448) 

<223> Glycosyl hydrolase family 1 

<220> 

<221> SITE 

<222> (10) . . . (24) 

<223> Glycosyl hydrolases family 1 N- terminal signature. Prosite id = 
PS00653 

<220> 

<221> SITE 

<222> {225} . . . (228) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (299) . . . (302) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (356) . . . (364) 

<223> Glycosyl hydrolases family 1 active site. Prosite id = PS00572 
<400> 26 

Met Ser Lys lie Thr Phe Pro Lys Asp Phe lie Trp Gly Ser Ala Thr 
1 5 10 15 

Ala Ala Tyr Gin He Glu Gly Ala Tyr Asn Glu Asp Gly Lys Gly Glu 
20 25 30 



Ser He Trp Asp Arg Phe Ser His Thr Pro Gly Asn He Ala Asp Gly 
35 40 45 



His Thr Gly Asp Val Ala Cys Asp Hi.s Tyr His Arg Tyr Glu Glu Asp 
50 55 60 



He Lys He Met Lys Glu He Gly He Lys Ser Tyr Arg Phe Ser He 
65 70 75 80 
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Ser Trp Pro Arg lie Phe Pro Glu Gly Thr Gly Lys Leu Asn Gin Lys 
85 90 95 



Gly Leu Asp Phe Tyr Lys Arg Leu Thr Asn Leu Leu Leu Glu Asn Gly 
100 105 110 



lie Met Pro Ala He Thr Leu Tyr His Trp Asp Leu Pro Gin Lys Leu 
115 120 125 



Gin Asp Lys Gly Gly Trp Lys Asn Arg Asp Thr Thr Asp Tyr Phe Thr 
130 135 140 



Glu Tyr Ser Glu Val He Phe Lys Asn Leu Gly Asp He Val Pro He 
145 150 155 160 



Trp Phe Thr His Asn Glu Pro Gly Val Val Ser Leu Leu Gly His Phe 
165 170 175 



Leu Gly He His Ala Pro Gly He Lys Asp Leu Arg Thr Ser Leu Glu 
180 185 190 



Val Ser His Asn Leu Leu Leu Ser His Gly Lys Ala Val Lys Leu Phe 
195 200 205 



Arg. Glu Met Asn He Asp Ala Gin He Gly He Ala Leu Asn Leu Ser 
210 215 220 



Tyr His Tyr Pro Ala Ser Glu Lys Ala Glu Asp He Glu Ala Ala Glu 
225 230 235 240 



Leu Ser Phe Ser Leu Ala Gly Arg Trp Tyr Leu Asp Pro Val Leu Lys 
245 250 255 



Gly Arg Tyr Pro Glu Asn Ala Leu Lys Leu Tyr Lys Lys Lys Gly He 
260 265 270 



Glu Leu Ser Phe Pro Glu Asp Asp Leu Lys Leu He Ser Gin Pro He 
275 280 285 



Asp Phe He Ala Phe Asn Asn Tyr Ser Ser Glu Phe He Lys Tyr Asp 
290 295 300 



Pro Ser Ser Glu Ser Gly Phe Ser Pro Ala Asn Ser He Leu Glu Lys 

310 315 320 



Phe Glu Lys Thr Asp Met Gly Trp He He Tyr Pro Glu Gly Leu Tyr 
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325 330 335 



Asp Leu Leu Met Leu Leu Asp Arg Asp Tyr Gly Lys Pro Asn He Val 
340 345 350 



He Ser Glu Asn Gly Ala Ala Phe Lys Asp Glu He Gly Ser Asn Gly 
355 360 365 



Lys He Glu Asp Thr Lys Arg He Gin Tyr Leu Lys Asp Tyr Leu Thr 
370 375 380 



Gin Ala His Arg Ala He Gin Asp Gly Val Asn Leu Lys Ala Tyr Tyr 
385 390 395 400 



Leu Trp Ser Leu Leu Asp Asn Phe Glu Trp Ala Tyr Gly Tyr Asn Lys 
405 410 415 



Arg Phe Gly He Val His Val Asn Phe Asp Thr Leu Glu Arg Lys He 
420 425 430 



Lys Asp Ser Gly Tyr Trp Tyr Lys Glu Val He Lys Asn Asn Gly Phe 
435 440 445 



<210> 27 
<211> 1362 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 27 



atggcaaaca agataacctt 


tcctgaaaat 


tttctgtggg 


gcgcggcaac 


ggcttcgtac 


60 


cagatcgaag gcgcctggaa 


caaacatggt 


aaaggcgaat 


ccacctggga 


tcgcttttca 


120 


cacacgcccg gtaagatcag 


gaacaacgat 


acgggcgatg 


tagcaaatga 


ccattatcgc 


180 


ctctggaaaa aagacattgg cttgatgaag 


aagatcgggt 


tgaaggctta 


tcgattttcc 


240 


atttcgtggc cgcgtattct 


tcctgctgga 


agaggcaagg 


tcaatcaaag 


agggctggat 


300 


ttttacaaca agatcgtaga 


tgagctgctg 


aaagcagata 


tcatcccatt 


tgttactctc 


360 


aatcactggg acctgcccca aaaactggaa 


gatgagggcg 


gctggccggc 


ccgttctact 


420 


gccgatgctt ttattgaata 


cacagatgtg 


atcacccgct 


cccttggcga 


ccgcgcaaag 


480 


aattggatca ctcacaatga 


acctgccgtc 


gttgcctgga 


tgggatactc 


cactggccaa 


540 


cacgcacccg gactgaagga 


ctatgggctt 


ggtgcccgcg 


ccgcgcatca 


cctgttgctc 


600 


tcacatggac aggctgtacc 


ggtcattcgc 


agcaatagcg 


cgggggcaga 


agtgggaatt 


660 
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acgctcgata 


ttagctggcg 


gatcgctgcc 


tcaaacagcc gcgccgaccg ggagctggtc 


720 


cgtgaggatg 


atgggaggtg 


gttccgctgg 


tttgccgacc cgctttacgg gcgcggatat 


780 


ccctccgata 


^ggtgtctga 


tttcactaag 


ttgggagcac tgcccaacgg acttgatttt 


840 


gtgcaggcag 


gcgacatgga 


cacgatcgcg 


acaccgactg attttatggg gctaaactac 


900 


tactcccgaa 


atgtctaccg 


cgcggacggt 


gcagataatg atccgcaaac tgttttccca 


960 


caaccgaaga 


tgcccgaaca 


ctggaccgag 


atgggctggg aaatttaccc ggatgggctg 


1020 


accaacattc 


tgggacgcgt 


ctatttcaac 


tatcagccgc gcaaactata cgtcacagaa 


1080 


aacggcgcca 


gttactccac 


gcctcctgat 


gataagggga atgtcgcgga tgaactccgc 


1140 


atccattatc 


tgaggacaca 


ttttgcagct 


gcctatcggg ccattcaaat gggcgtgcct 


1200 


ctggcaggat 


acttcgtctg 


gtccctcatg 


gacaactttg agtggtcatg gggctatatg 


1260 


caacgctttg 


gactcatctg 


ergtggattat 


gagacccaaa aacgcacttt aaaggatagc 


1320 


gcaaaatggt 


ataagcgcgt 


gatcaagaag 


aatgggctct aa 


1362 



<210> 28 
<211> 453 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 

<222> (3) . . . (453) 

<223> Glycosyl hydrolase family 1 

<220> 

<221> SITE 

<222> (11) . . . (25) 

<223> Glycosyl hydrolases family 1 N-terminal signature. Prosite id =: 
PS00653 

<220> 

<221> SITE 

<222> (49) . . . (52) 

<223> N-glycosylation site, Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (361) . . . (369) 

<223> Glycosyl hydrolases family 1 active site. Prosite id = PS00572 
*<400> 28 

Met Ala Asn Lys He Thr Phe Pro Glu Asn Phe Leu Trp Gly Ala Ala 
^5 10 15 



Thr Ala Ser Tyr Gin He Glu Gly Ala Trp Asn hys His Gly hys Gly 
20 25 30 
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Glu Ser Thr Trp Asp Arg Phe Ser His Thr Pro Gly Lys lie Arg Asn 
35 40 45 



Asn Asp Thr Gly Asp Val Ala Asn Asp His Tyr Arg Leu Trp Lys Lys 
50 55 60 



Asp lie Gly Leu Met Lys Lys lie Gly Leu Lys Ala Tyr Arg Phe Ser 
65 70 75 80 



lie Ser Trp Pro Arg lie Leu Pro Ala Gly Arg Gly Lys Val Asn Gin 
85 90 95 



Arg Gly Leu Asp Phe Tyr Asn Lys lie Val Asp Glu Leu Leu Lys Ala 
100 105 110 



Asp He He Pro Phe Val Thr Leu Asn His Trp Asp Leu Pro Gin Lys 
115 120 125 



Leu Glu Asp Glu Gly Gly Trp Pro Ala Arg Ser Thr Ala Asp Ala Phe 
130 135 140 



He Glu Tyr Thr Asp Val He Thr Arg Ser Leu Gly Asp Arg Ala Lys 
145 150 155 160 



Asn Trp He Thr His Asn Glu Pro Ala Val Val Ala Trp Met Gly Tyr 
•165 170 175 



Ser Thr Gly Gin His Ala Pro Gly Leu Lys Asp Tyr Gly Leu Gly Ala 
180 185 190 



Arg Ala Ala His His Leu Leu Leu Ser His Gly Gin Ala Val Pro Val 
195 200 205 



He Arg Ser Asn Ser Ala Gly Ala Glu Val Gly He Thr Leu Asp He 
210 215 220 



Ser Trp Arg He Ala Ala Ser Asn Ser Arg Ala Asp Arg Glu Leu Val 
225 230 235 240 



Arg Glu Asp Asp Gly Arg Trp Phe Arg Trp Phe Ala Asp Pro Leu Tyr 
245 250 255 



Gly Arg Gly Tyr Pro Ser Asp Lys Val Ser Asp Phe Thr Lys Leu Gly 
260 265 270 
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Ala Leu Pro Asn Gly Leu Asp Phe Val Gin Ala Gly Asp Met Asp Thr 
275 280 285 



lie Ala Thr Pro Thr Asp Phe Met Gly Leu Asn Tyr Tyr Ser Arg Asn 
290 295 300 



Val Tyr Arg Ala Asp Gly Ala Asp Asn Asp Pro Gin Thr Val Phe Pro 
305 310 315 320 



Gin Pro Lys Met Pro Glu His Trp Thr Glu Met Gly Trp Glu lie Tyr 
325 330 335 



Pro Asp Gly Leu Thr Asn lie Leu Gly Arg Val Tyr Phe Asn Tyr Gin 
340 345 350 



Pro Arg Lys Leu Tyr Val Thr Glu Asn Gly Ala Ser Tyr Ser Thr Pro 
355 360 365 



Pro Asp Asp Lys Gly Asn Val Ala Asp Glu Leu Arg He His Tyr Leu 
370 375 380 



Arg Thr His Phe Ala Ala Ala Tyr Arg Ala He Gin Met Gly Val Pro 
385 390 395 400 



Leu Ala Gly Tyr Phe Val Trp Ser Leu Met Asp Asn Phe Glu Trp Ser 
405 410 415 



Trp Gly Tyr Met Gin Arg Phe Gly Leu He Trp Val Asp Tyr Glu Thr 
420 425 430 



Gin Lys Arg Thr Leu Lys Asp Ser Ala Lys Trp Tyr Lys Arg Val He 
435 440 445 



Lys Lys Asn Gly Leu 
450 



<210> 29 
<211> 1362 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 29 

atggcgaaca aaattacctt tcccgaaaat tttctttggg gcgcggcaac agcctcctac 60 



cagatcgaag gtgcgtggga caaacatggc aagggtgaat ccatctggga tcgcttttcg 120 
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catacccctg 


gcaagatcag 


aaataatgat 


acgggcgatg 


ttgccaatga 


tcattatcgt 


180 


ctctggaaaa 


aagacattgg 


cttgatgaag 


aagatcggct 


tgaaggcata 


tcgtttttcc 


240 


atttcgtggc 


cgcgtgttct 


tcccgccgga 


cgcggcaaag 


tcaatcagaa 


gggactggat 


300 


ttctataaca 


ggctggtaga 


tgctctgttg 


aaagaagata 


tcatcccatt 


tgtgactctc 


360 


aatcactggg 


acctgcccca 


aaagctggag 


gaggaaggcg 


gttggccggt 


tcgctccacc 


420 


gcagatgcct 


ttgtggaata 


cacagacgtg 


gtcacacgtt 


ccctcggcga 


ccgcgtaaag 


480 


aattggatca 


cgcataatga 


gcctgccgtc 


gttgcctgga 


tgggatattc 


cacaggtcaa 


540 


cacgcacccg 


gtttgaagga 


ctatgggctt 


ggtgtgcgcg 


ccgcgcatca 


tctgctgctc 


600 


tcccacgggc 


aggcggtgcc 


agtcatccgc 


agtaacagcg 


ccgatgcaga 


agtgggcatt 


660 


acgctggata 


ttagctggcg 


gattcctgcc 


tccaatagcc 


gagcagaccg 


ggaattggtc 


720 


cgtaaagatg 


acggactatg 


gttccgctgg 


ttcgccgatc 


cgctttatgg 


gcgcggatac 


780 


ccctcggata 


aagtcaccga 


ttttacaaag 


atcggcgcgc 


tgcccaatgg 


tctggacttt 


840 


atgcaagccg 


gtgatatgga 


tgcgatcgcc 


acgccaaccg 


atttcatggg gctgaactat 


900 


tatttccgaa 


atgtctaccg 


cgcgaatggc 


gaagacaatg 


atccgcaggt 


cgttttccca 


960 


caaccaaaga 


tgcccgaaca 


ctggacggag 


atgggctggg 


aaatctatcG 


ggatggactg 


1020 


acgaacatcc 


tgggacgcgt 


ttatttcaat 


taccagccac 


ataaactgta 


tatcacagag 


1080 


aacggcgcga 


gctactccac 


cccgcccgat 


gaaaagggga 


atgtcgccga 


tgagctccgc 


1140 


actcattatt 


tacggacaca 


cttcgcggct 


gcctaccggg 


cgattcagat 


gggcgtgcct 


1200 


ctggcaggat 


actttgtctg 


gtccctcatg 


gacaactttg 


agtggtcctg gggatatatg 


1260 


cagcgctttg 


ggctcatctg 


ggtggactac 


gagacacaga 


aacgcaccct gaaggatagc 


1320 


gccaagtggt 


acaaacgtgt 


gatcaggaag 


aatgggtttt 


ag 




1362 



<210> 30 
<211> 453 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (3) . . . (453) 

<223> Glycosyl hydrolase family 1 
<220> 

<221> SITE 

<222> (11) . . . (25) 

<223> Glycosyl hydrolases family 1 N- terminal signature. Prosite id - 
PS00653 
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<220> 

<221> SITE 

<222> (49) . . . (52) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (361) . . . (369) 

<223> Glycosyl hydrolases family 1 active site, Prosite id = PS00572 
<400> 30 

Met Ala Asn Lys He Thr Phe Pro Glu Asn Phe Leu Trp Gly Ala Ala 
15 10 15 



Thr Ala Ser Tyr Gin He Glu Gly Ala Trp Asp Lys His Gly Lys Gly 
20 25 30 



Glu Ser He Trp Asp Arg Phe Ser His Thr Pro Gly Lys He Arg Asn 
35 40 45 



Asn Asp Thr Gly Asp Val Ala Asn Asp His Tyr Arg Leu Trp Lys Lys 
50 55 60 



Asp He Gly Leu Met Lys Lys He Gly Leu Lys Ala Tyr Arg Phe Ser 
65 70 75 80 



He Ser Trp Pro Arg Val Leu Pro Ala Gly Arg Gly Lys Val Asn Gin 
85 90 95 



Lys Gly Leu Asp Phe Tyr Asn Arg Leu Val Asp Ala Leu Leu Lys Glu 
100 105 110 



Asp He He Pro Phe Val Thr Leu Asn His Trp Asp Leu Pro Gin Lys 
115 120 125 



Leu Glu Glu Glu Gly Gly Trp Pro Val Arg Ser Thr Ala Asp Ala Phe 
130 135 140 



Val Glu Tyr Thr Asp Val Val Thr Arg Ser Leu Gly Asp Arg Val Lys 
145 150 155 160 



Asn Trp He Thr His Asn Glu Pro Ala Val Val Ala Trp Met Gly Tyr 
165 170 175 



Ser Thr Gly Gin His Ala Pro Gly Leu Lys Asp Tyr Gly Leu Gly Val 
180 185 190 



Arg Ala Ala His His Leu Leu Leu Ser His Gly Gin Ala Val Pro Val 
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lie Arg Ser Asn Ser Ala Asp Ala Glu Val Gly He Thr Leu Asp He 
210 215 220 



Ser Trp Arg He Pro Ala Ser Asn Ser Arg Ala Asp Arg Glu Leu Val 
225 230 235 240 



Arg Lys Asp Asp Gly Leu Trp Phe Arg Trp Phe Ala Asp Pro Leu Tyr 
245 250 255 



Gly Arg Gly Tyr Pro Ser Asp Lys Val Thr Asp Phe Thr Lys He Gly 
260 265 270 



Ala Leu Pro Asn Gly Leu Asp Phe Met Gin Ala Gly Asp Met Asp Ala 
275 280 285 



He Ala Thr Pro Thr Asp Phe Met Gly Leu Asn Tyr Tyr Phe Arg Asn 
290 295 300 



Val Tyr Arg Ala Asn Gly Glu Asp Asn Asp Pro Gin Val Val Phe Pro 
305 310 315 320 



Gin Pro Lys Met Pro Glu His Trp Thr Glu Met Gly Trp Glu He Tyr 
325 330 335 



Pro Asp Gly Leu Thr Asn He Leu Gly Arg Val Tyr Phe Asn Tyr Gin 
340 345 350 



Pro His Lys Leu Tyr He Thr Glu Asn Gly Ala Ser Tyr Ser Thr Pro 
355 360 365 



Pro Asp Glu Lys Gly Asn Val Ala Asp Glu Leu Arg Thr His Tyr Leu 
370 375 380 



Arg Thr His Phe Ala Ala Ala Tyr Arg Ala He Gin Met Gly Val Pro 
385 390 395 400 



Leu Ala Gly Tyr Phe Val Trp Ser Leu Met Asp Asn Phe Glu 'Trp Ser 
405 410 415 



Trp Gly Tyr Met Gin Arg Phe Gly Leu He Trp Val Asp Tyr Glu Thr 
420 425 430 



Gin Lys Arg Thr Leu Lys Asp Ser Ala Lys Trp Tyr Lys Arg Val He 
435 440 445 
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Arg Lys Asn Gly Phe 
450 

<210> 31 
<211> 1167 
<212> DNA 
<213> Unknovm 

<220> 

<223> Obtained from environmental sample 
<400> 31 



atggaagacc 


gcccgcacta 


ttacagcgac 


gaccatctct 


ggggtgtact 


gtgcgtgacc 


60 


gcctacatca 


aggaaactgg 


ggactttgca 


ttcctggacg 


agaaagttca 


cttttacgag 


120 


aaggacccgg 


tcgagggcgt 


gtctgtgctg 


gatcacgtta 


aacgggcctt 


gacctttacc 


180 


cgcaacaaca 


tcgggaaaca 


tggtctgcct 


ctcctcggct 


ttgcggattg gaacga'cacg 


240 


atcaatctgg 


cgaagggcgc 


cgagtctctt 


ttcacgtcgc 


atctatatgg 


acgcgcgctg 


300 


ctggagttta 


ttgatctgct 


cacatatctt 


ggcaagaacg 


atgaagccga 


tgaatggcag 


360 


cgagcccacg 


ttgagatgca 


gtcccgcgtc 


gaaaaacatg 


cctgggatgg 


cgaatggtat 


420 


ttcatgtact 


ttgaccacga 


cggcagcccg 


gttgggtctc 


acacgaatca 


gtatggaaag 


480 


atccatctca 


acggacagag 


ctgggctgtg 


ctttcgggct 


ttgcctctcc 


gcagcgcgcc 


540 


cgccaggcca 


tggactcggt 


ttacaagcat 


ctcaacacaa 


agcacggcat 


caagctctcc 


600 


acgccgggct 


acaatggcta 


tgaccccaac 


tacggcggcg 


tgaccaccta 


cccaccggga 


660 


gcaaaggaaa 


acggcggcat 


cttcctgcac 


ccgaatccct 


gggccatgat 


cgcagagacc 


720 


atgctcgggg 


atggcgatcg 


cgcctacgag 


tattactcgc 


agatcaaccc 


ggccggcaag 


780 


aacgatgaca 


tcgacctgta 


cgaggtcgag 


ccatatgttt 


acgctcaaaa 


catcctgggc 


840 


gatgagcatc 


cgcagttcgg 


gctgggacgc 


aactcgtggc 


tctcgggtac 


ggcatcctgg 


900 


tgctatcagg 


ctgccacaca 


gtggatcctc 


ggaatccgcg 


ccgactatga 


agggctgcgc 


960 


atcgacccgt 


gcattccgtc 


caagtgggat 


gggttcaagg 


caacgcgcct 


gtatcgcggc 


1020 


gtgaagtaca 


acattacggt 


caccaacccg 


aagcacatct 


gcaaaggcgt 


ggaaaaagtt 


1080 


ctggtcaacg 


gcaaaccggt 


tgaggggaat 


gtggtccggg 


cagacgtggg 


tttgcgcgaa 


1140 


gtgaacgtgg 


aagttacctt 


aggataa 








1167 



<210> 32 
<211> 388 
<212> PRT 
<213> Unknown 

<220> 
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<223> Obtained from environmental sample 
<220> 

<221> SITE 

<222> (79) . . . (82) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (349) . . . (352) 

<223> N-glycosylation site. ' Prosite id = PSOOOOl 
<400> 32 

Met Glu Asp Arg Pro His Tyr Tyr Ser Asp Asp His Leu Trp Gly Val 
15 10 15 



Leu Cys Val Thr Ala Tyr lie Lys Glu Thr Gly Asp Phe Ala Phe Leu 
20 25 30 



Asp Glu Lys Val His Phe Tyr Glu Lys Asp Pro Val Glu Gly Val Ser 
35 40 45 



Val Leu Asp His Val Lys Arg Ala Leu Thr Phe Thr Arg Aan Asn lie 
50 55 60 



Gly Lys His Gly Leu Pro Leu Leu Gly Phe Ala Asp Trp Asn Asp Thr 
65 70 75 ' 80 



lie Asn Leu Ala Lys Gly Ala Glu Ser Leu Phe Thr Ser His Leu Tyr 
85 90 95 



Gly Arg Ala Leu Leu Glu Phe He Asp Leu Leu Thr Tyr Leu Gly Lys 
100 105 110 



Asn Asp Glu Ala Asp Glu Trp Gin Arg Ala His Val Glu Met Gin Ser 
115 120 125 



Arg Val Glu Lys His Ala Trp Asp Gly Glu Trp Tyr Phe Met Tyr Phe 
130 135 140 



Asp His Asp Gly Ser Pro Val Gly Ser His Thr Asn Gin Tyr Gly Lys 
145 150 155 160 



He His Leu Asn Gly Gin Ser Trp Ala Val Leu Ser Gly Phe Ala Ser 
165 170 175 



Pro Gin Arg Ala Arg Gin Ala Met Asp Ser Val Tyr Lys His Leu Asn 
180 185 190 



wo 2006/101584 



58/248 



PCT/US2006/002516 



Thr Lys His Gly He Lys Leu Ser Thr Pro Gly Tyr Asn Gly Tyr Asp 
195 200 205 



Pro Asn Tyr Gly Gly Val Thr Thr Tyr Pro Pro Gly Ala Lys Glu Asn 
210 215 220 



Gly Gly lie Phe Leu His Pro Asn Pro Trp Ala Met He Ala Glu Thr 
225 230 235 240 



Met Leu Gly Asp Gly Asp Arg Ala Tyr Glu Tyr Tyr Ser Gin He Asn 
245 250 255 



Pro Ala Gly Lys Asn Asp Asp He Asp Leu Tyr Glu Val Glu Pro Tyr 
260 265 270 



Val Tyr Ala Gin Asn He Leu Gly Asp Glu His Pro Gin Phe Gly Leu 
275 280 285 



Gly Arg Asn Ser Trp Leu Ser Gly Thr Ala Ser Trp Cys Tyr Gin Ala 
290 295 300 



Ala Thr Gin Trp He Leu Gly He Arg Ala Asp Tyr Glu Gly Leu Arg 
305 310 315 320 



He Asp Pro Cys He Pro Ser Lys Trp Asp Gly Phe Lys Ala Thr Arg 
325 330 335 



Leu Tyr Arg Gly Val Lys Tyr Asn He Thr Val Thr Asn Pro Lys His 
340 345 350 



He Cys Lys Gly Val Glu Lys Val Leu Val Asn Gly Lys Pro Val Glu 
355 360 365 



Gly Asn Val Val Arg Ala Asp Val Gly Leu Arg Glu Val Asn Val Glu 
370 375 380 



Val Thr Leu Gly 
385 



<210> 33 
<211> 1362 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 



<400> 33 
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atggcaaata 


aaattctctt 


ccccgagaac 


tttctctggg 


gcacggcgac 


cgcatcctac 


60 


cagatcgagg 


gggcttggga 


taaacatggt 


aagggcgagt 


cgacctggga 


ccgttttacg 


120 


catacacctg 


gaaagatcaa 


aaacaatgat 


acgggcgatg 


tagcagatga 


ccattatcga 


180 


ttatggaaaa 


aagatatcgg 


cttgatgaag 


aagctcggct 


tgaaggctta 


tcgtttttcg 


240 


acttcctggc 


cgcgggtgct 


gccggccggg 


cgcggtaaga 


gcaatcaaaa 


aggactcgat 


300 


ttctacagca 


agctggttga 


tgagttgcta 


aaagcaaata 


tcatcccatt 


cgtgacattg 


360 


aatcactggg 


acatcccaca 


aaagttggag 


gacgagggtg 


gctgggccgt 


gcgctcaacg 


420 


gctgaggcat 


ttgtggaata 


tgccgatctc 


atgtcgcgca 


cgcttggaga 


ccgcgtcaag 


480 


aactggatca 


cgcacaacga 


accggccgtc 


gtcgcctgga 


tgggatacgg gatgggcatc 


540 


cacgcgccgg 


gcttaacgga 


tttctcgatt 


gcggtgccgg 


tctcgcatca 


tctgctcctt 


600 


tcgcacggat 


gggccgtgcc 


tgtgattcgc 


ggtaacagcc 


cggatgccga ggtgggcatt 


660 


accctcaaca 


ttcaatgggg 


cgaagcagca 


tccaacagcc 


gggccgacct 


aaacgccctg 


720 


cgcctgaacg 


atggacagtg 


gttccgctgg 


tttgccgatc 


cggtttatgg 


ccgcggctat 


780 


ccttccgacg 


tggtggctga 


tttcgagaaa 


atgggcgcgc 


tgccgaacgg 


catgaatttc 


840 


gtgcaacctg 


gcgatatgga 


tgtcatcgcc 


acgccaaccg 


atttcctcgg gctcaattat 


900 


tattcccgcc 


atgtgcatcg 


cgtcaacaca 


ccggataacg 


atcaacaggt 


tgtgtttgcc 


960 


aaacagcagg 


gtcccgagaa 


ctggaccgag 


atgggctggg 


agatccatcc 


tgatggattg 


1020 


gccggaattt 


tatccagagc 


gtatttcaat 


taccagccgc 


gcaaagtata 


tgtgactgaa 


1080 


aacggtgcca 


gctattccac 


cgcgcccgat 


gagaatggta 


ttgtcaacga 


cattcaccgc 


1140 


gtcaattatc 


tacggacgca 


cttcgcggct 


gcccatcgcg 


ccctgcaggc 


gggcgtgcca 


1200 


ttggcaggat 


acttcgtctg 


gtcaatgctc 


gataacttcg 


aatggagtca 


cgggtacagc 


1260 


cagcgctttg 


gcatcgttta 


tgtggactat 


caaacccaga 


agcgttactt 


gaaagacagc 


1320 


gccaagtggt 


acaaaggtgt 


catcaaaaag 


aatgggttct 


aa 




1362 



<210> 34 
<211> 453 
<212> PRT 
<213> Unknovm 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 

<222> (3) . . . (453) 

<223> Glycosyl hydrolase family 1 

<220> 

<221> SITE 
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<222> (11) . . . (25) 

<223> Glycosyl hydrolases family 1 N- terminal signature. Prosite id = 
PS00653 

<220> 

<221> SITE 

<222> (49) . . . (52) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (332) . . . (335) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (361) . . . (369) 

<223> Glycosyl hydrolases family 1 active site. Prosite id = PS00572 
<400> 34. 

Met Ala Asn Lys lie Leu Phe Pro Glu Asn Phe Leu Trp Gly Thr Ala 
15 10 15 



Thr Ala Ser Tyr Gin He Glu Gly Ala Trp Asp Lys His Gly Lys Gly 
20 25 30 



Glu Ser Thr Trp Asp Arg Phe Thr His. Thr Pro Gly Lys He Lys Asn 
35 40 45 



Asn Asp Thr Gly Asp Val Ala Asp Asp His Tyr Arg Leu Trp Lys Lys 
50 55 60 



Asp He Gly Leu Met Lys Lys Leu Gly Leu Lys Ala Tyr Arg Phe Ser 
65 70 75 80 



Thr Ser Trp Pro Arg Val Leu Pro Ala Gly Arg Gly Lys Ser Asn Gin 
85 90 95 



Lys Gly Leu Asp Phe Tyr Ser Lys Leu Val Asp Glu Leu Leu Lys Ala 
100 105 110 



Asn He He Pro Phe Val Thr Leu Asn His Trp Asp He Pro Gin Lys 
115 120 125 



Leu Glu Asp Glu Gly Gly Trp Ala Val Arg Ser Thr Ala Glu Ala Phe 
130 135 140 



Val Glu Tyr Ala Asp Leu Met Ser Arg Thr Leu Gly Asp Arg Val Lys 
145 150 155 160 



Asn Trp He Thr His Asn Glu Pro Ala Val Val Ala Trp Met Gly Tyr 
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165 170 175 



Gly Met Gly lie His Ala Pro Gly Leu Thr Asp Phe Ser lie Ala Val 
180 185 190 



Pro Val Ser His His Leu Leu Leu Ser His Gly Trp Ala Val Pro Val 
195 200 205 



He Arg Gly Asn Ser Pro Asp Ala Glu Val Gly He Thr Leu Asn He 
210 215 220 



Gin Trp Gly Glu Ala Ala Ser Asn Ser Arg Ala Asp Leu Asn Ala Leu 
225 230 235 240 



Arg Leu Asn Asp Gly Gin Trp Phe Arg Trp Phe Ala Asp Pro Val Tyr 
245 250 255 



Gly Arg Gly Tyr Pro Ser Asp Val Val Ala Asp Phe Glu Lys Met Gly 
260 265 270 



Ala Leu Pro Asn Gly Met Asn Phe Val Gin Pro Gly Asp Met Asp Val 
275 280 285 



He Ala Thr Pro Thr Asp Phe Leu Gly Leu Asn Tyr Tyr Ser Arg His 
290 295 300 



Val His Arg Val Asn Thr Pro Asp Asn Asp Gin Gin Val Val Phe Ala 
305 310 315 320 



Lys Gin Gin Gly Pro Glu Asn Trp Thr Glu Met Gly Trp Glu He His 
325 330 335 



Pro Asp Gly Leu Ala Gly He Leu Ser Arg Ala Tyr Phe Asn Tyr Gin 
340 345 350 



Pro Arg Lys Val Tyr Val Thr Glu Asn Gly Ala Ser Tyr Ser Thr Ala 
355 360 365 



Pro Asp Glu Asn Gly He Val Asn Asp He His Arg Val Asn Tyr Leu 
370 375 380 



Arg Thr His Phe Ala Ala Ala His Arg Ala Leu Gin Ala Gly Val Pro 
385 390 395 400 



Leu Ala Gly Tyr Phe Val Trp Ser Met Leu Asp Asn Phe Glu Trp Ser 
405 410 415 
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His Gly Tyr Ser Gin Arg Phe Gly lie Val Tyr Val Asp Tyr Gin Thr 
420 425 ' 430 

Gin Lys Arg Tyr Leu Lys Asp Ser Ala Lys Trp Tyr Lys Gly Val He 
435 440 445 

Lys Lys Asn Gly Phe 
450 

<210> 35 
<211> 1116 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 



<400> 35 



atgaataaaa 


tcctcaaact 


cttcagcagc 


ctgctgcttt 


ttgcaggcat 


ctgtcccgcg 


60 


cttcaggcag 


agccagtaga 


aacctacttt 


cccctgtccc 


gcgggatcaa 


catgagccac 


120 


tggctctctc 


aagtgaatga 


aaacattccc 


gaccgttcca 


cctatgtgac 


ggagcgggat 


180 


ttgcaatttc 


tgcgggcagc 


cggtttcgac 


catgtgcgtc 


tgccaatcga 


tgaggtcgaa 


240 


ctctgggatg 


aagagggcaa 


tcagatcgag 


gaggcctggc 


aatacatgca 


taactttctc 


300 


cgttggagcc 


gaaagaacga 


tctccgggtc 


attctcgacc 


tgcacacggt 


attgtcccac 


360 


cacttcaacg 


cggtaaatat 


gggagaggtc 


aatacactct 


tcaatgatcc 


cagggaacag 


420 


gaaaagttcc 


tcaacctatg 


ggaacaaatc 


atggatgccg 


tgggtcacca 


tccgaatgag 


480 


tttctcgcct 


atgaaatgct 


caatgaggcg 


gtcgcggaag 


atgatgaaga 


ctggaatctg 


540 


ctcctcaacc 


gcgccattgt 


ccgcatccgg 


gaccgtgagc 


cttatcgggt 


gctgattgcg 


600 


gggtcgaact 


ggtggcagca 


tgccgaccgg 


gtccccaacc 


tgaggctccc 


gaaaggagac 


660 


cccaatatca 


tcatcagttt 


tcatttttat 


tccccttttc 


tcttcaccca 


ctaccgcagt 


720 


agctggactg 


cgatgcaggc 


gtaccagggc 


ttcgtccaat 


accctggcaa 


aaccatacct 


780 


tccatacatc 


tcgaaggcat 


gaactacccg 


gagtccttcg 


ttcatatgtg 


ggaagcgcac 


840 


aatcggtact 


atgacatcca 


ttccatgtat 


gccgaaatgg 


tcccggcggt 


gcgttttgcc 


900 


gaaaagttgg 


gacttcggct 


ctattgcgga 


gaattcgggg 


ccatgaagac 


cgttgatcgc 


960 


gcccagatgc 


tgcagtggta 


tcgggatgtt 


gtcactgtat 


ttaataaatt 


gggtattccc 


1020 


tatactgcct 


gggattatca 


gggaaccttc 


ggaatccgcg 


atgagctgac 


cggtgagccc 


1080 


gatcatgaaa 


tgatcgatat 


tctcctcggg 


cgctga 






1116 
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<210> 36 
<211> 371 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAL 
<222> (1) . . . (23) 

<220> 

<221> DOMAIN 
<222> (39) . , . (350) 

<223> Cellulase (glycosyl hydrolase family 5) 
<220> 

<221> SITE 

<222> (37) . . . (40) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 36 

Met Asn Lys lie Leu Lys Leu Phe Ser Ser Leu Leu Leu Phe Ala Gly 
15 10 15 



lie Cys Pro Ala Leu Gin Ala Glu Pro Val Glu Thr Tyr Phe Pro Leu 
20 25 30 



Ser Arg Gly lie Asn Met Ser His Trp Leu Ser Gin Val Asn Glu Asn 
35 40 45 



lie Pro Asp Arg Ser Thr Tyr Val Thr Glu Arg Asp Leu Gin Phe Leu 
50 55 60 



Arg Ala Ala Gly Phe Asp His Val Arg Leu Pro He Asp Glu Val Glu 
65 70 75 80 



Leu Trp Asp Glu Glu Gly Asn Gin He Glu Glu Ala Trp Gin Tyr Met 
85 90 95 



His Asn Phe Leu Arg Trp Ser Arg Lys Asn Asp Leu Arg Val He Leu 
100 105 110 



Asp Leu His Thr Val Leu Ser His His Phe Asn Ala Val Asn Met Gly 
115 120 125 



Glu Val Asn Thr Leu Phe Asn Asp Pro Arg Glu Gin Glu Lys Phe Leu 
130 135 140 



Asn Leu Trp Glu Gin He Met Asp Ala Val Gly His His Pro Asn Glu 
145 150 155 160 
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Phe Leu Ala Tyr Glu Met Leu Asn Glu Ala Val Ala Glu Asp Asp Glu 
165 170 175 



Asp Trp Asn Leu Leu Leu Asn Arg Ala lie Val Arg He Arg Asp Arg 
180 185 190 



Glu Pro Tyr Arg Val Leu He Ala Gly Ser Asn Trp Trp Gin His Ala 
195 ^ 200 205 



Asp Arg Val Pro Asn Leu Arg Leu Pro Lys Gly Asp Pro Asn He He 
210 215 220 



He Ser Phe His Phe Tyr Ser Pro Phe Leu Phe Thr His Tyr Arg Ser 
225 230 235 240 



Ser Trp Thr Ala Met Gin Ala Tyr Gin Gly Phe Val Gin Tyr Pro Gly 
245 250 255 



Lys Thr He Pro Ser He His Leu Glu Gly Met Asn Tyr Pro Glu Ser 
260 265 270 



Phe Val His Met Trp Glu Ala His Asn Arg Tyr Tyr Asp He His Ser 
275 280 285 



Met Tyr Ala Glu Met Val Pro Ala Val Arg Phe Ala Glu Lys Leu Gly 
290 295 300 



Leu Arg Leu Tyr Cys Gly Glu Phe Gly Ala Met Lys Thr Val Asp Arg 
305 310 315 320 



Ala Gin Met Leu Gin Trp Tyr Arg Asp Val Val Thr Val Phe Asn Lys 
325 330 335 



Leu Gly He Pro Tyr Thr Ala Trp -Asp Tyr Gin Gly Thr Phe Gly He 
340 345 350 



Arg Asp Glu Leu Thr Gly Glu Pro Asp His Glu Met He Asp He Leu 
355 360 365 



Leu Gly Arg 
370 



<210> 37 
<211> 1383 
<212> DNA 
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<213> Unknown 
<220> 

<223> Obtained from environmental sample 



<400> 37 



atgagcaaac 


tccccaaatt 


cctctttgga 


gccggcacct 


caagttatca 


gatcgaaggt 


60 


gcctggaata 


tagatggcaa 


aggtccctcc 


atttgggatt 


tccacactcg 


ccatcccggc 


120 


gcggtttatc 


ggatgcacaa 


cggggatatg 


gcctgcgatc attatcatcg gtatcgaacg 


180 


gatatcgagc 


tgatgcagaa 


gatcggccta 


gaggcttacc 


gcttttccat 


aaactggccc 


240 


cgggttctgc 


cggaagggac 


cggtgccgcc 


aatgaagcag 


gtctggactt 


ttacgaccgg 


300 


ctggtggacg 


cactgttgga 


agcgggaatt 


cagccttgga 


tcacccttta 


tcactgggaa 


360 


ctcccctggg 


ctctccacct 


gcgcgggggt 


tggctcaatc 


gggacatgcc 


cgaccacatt 


420 


gagaactacg 


ccgccttggt 


cgccaggtgc 


ctcggtgacc 


gggtgaaaaa ctggattact 


480 


ttgaatgagc 


ctcaggtttt 


catcgggctt 


ggctatgcca 


gcggggttca 


tgcccccggc 


540 


tataagttgt 


ccttgcggga 


gtgcctggtc 


ggttcccacc 


atgccgtgct 


ttcccaccac 


600 


cgggcagtca 


aggcgatccg 


ggccaactgc 


gaaggcagcg 


tccagatcgg 


ctcagccccg 


660 


gtgggtgttg 


tctgccgacc 


ggaaacggag 


tcggcagcag acattgaggc 


tgcccgccag 


720 


gccacctacc 


atatcaacac 


tcccagcacc 


cacactcccg acaatctgat 


cggctgcctc 


780 


tggaacagca 


cttggtggat 


agatccaatg 


gttctgggga agtatccgga 


acacgggctg 


840 


aaagcctttg 


aaagctatct 


gccggacaac 


attcaggccg 


aactggatgc 


cgtattcgaa 


900 


ccgacggact 


ttgtcggttc 


caacatctac 


cacggccgca 


cggtgcgggc 


caagcaggat 


960 


ggtggttttg 


agtttatcga 


ccttccgccc 


ggcagccccc 


gcaccaccat 


gggctgggac 


1020 


atcaccccgg 


acatcctcta 


ctggggagga 


aagtatcttt 


acgaacgcta 


tggcaagccg 


1080 


atgtttatca 


cggaaaacgg 


cattgccgtc 


ccggaactgg 


tgaatgatga aggccaggtc 


1140 


gaggataccg 


tccgtgagca 


atacatgaag 


ctgcacctgc 


Srtgggctgca gcgggcccgc 


1200 


gatgaaggca 


tcccctatgc 


cggatacttc 


cactggtccc 


tgctcgacaa 


cttcgagtgg 


1260 


gaacaaggct 


acfccccagcg 


ctttggcatg 


gtctacgtcg 


actaccagac 


ccaggaacgt 


1320 


atcctcaaac 


gttcgggcca 


gcatttcgct 


gccatcgtcc 


gggaaatcac 


cggaaccgcc 


1380 


taa 












1383 



<210> 38 
<211> 460 
<212> PRT 
<213> Unloiown 



<220> 

<223> Obtained from environmental sample 
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<220> 

<221> DOMAIN 
<222> (1) . . . (458) 

<223> Glycosyl hydrolase family 1 
<220> 

<221> SITE 
<222> (7) . . . (21) 

<223> Glycosyl hydrolases family 1 N- terminal signature. Prosite id = 
PS00653 

<220> 

<221> SITE 

<222> (266) . . . (269) 

<223> N-glycosylation site. Prosite id = PSOOOOl 

<220> 

<221> SITE 

<222> (356) . . . (374) 

<223> Glycosyl hydrolases family 1 active site. Prosite id = PS00572 
<400> 38 

Met Ser hys Leu Pro Lys Phe Leu Phe Gly Ala Gly Thr Ser Ser Tyr 
15 10 15 



Gin lie Glu Gly Ala Trp Asn lie Asp Gly Lys Gly Pro Ser lie Trp 
20 25 30 



Asp Phe His Thr Arg His Pro Gly Ala Val Tyr Arg Met His Asn Gly 
35 40 45 



Asp Met Ala Cys Asp His Tyr His Arg Tyr Arg Thr Asp lie Glu Leu 
50 55 60 



Met Gin Lys lie Gly Leu Glu Ala Tyr Arg Phe Ser He Asn Trp Pro 
65 70 75 80 



Arg Val Leu Pro Glu Gly Thr Gly Ala Ala Asn Glu Ala Gly Leu Asp 
85 90 95 



Phe Tyr Asp Arg Leu Val Asp Ala Leu Leu Glu Ala Gly lie Gin Pro 
100 105 110 



Trp He Thr Leu Tyr His Trp Glu Leu Pro Trp Ala Leu His Leu Arg 
115 120 125 



Gly Gly Trp Leu Asn Arg Asp Met Pro Asp His He Glu Asn Tyr Ala 
130 135 140 



Ala Leu Val Ala Arg Cys Leu Gly Asp Arg Val Lys Asn Trp He Thr 
145 150 155 160 
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Leu Asn Glu Pro Gin Val Phe He Gly Leu Gly Tyr Ala Ser Gly Val 
165 170 175 



His Ala Pro Gly Tyr Lys Leu Ser Leu Arg Glu Cys Leu Val Gly Ser 
180 185 190 



His His Ala Val Leu Ser His His Arg Ala Val Lys Ala He Arg Ala 
195 200 205 



Asn Cys Glu Gly Ser Val Gin He Gly Ser Ala Pro Val Gly Val Val 
210 215 220 



Cys Arg Pro Glu Thr Glu Ser Ala Ala Asp He Glu Ala Ala Arg Gin 
225 230 235 240 



Ala Thr Tyr His He Asn Thr Pro Ser Thr His Thr Pro Asp Asn Leu 
245 250 255 



He Gly Cys Leu Trp Asn Ser Thr Trp Trp He Asp Pro Met Val Leu 
260 265 270 



Gly Lys Tyr Pro Glu His Gly Leu Lys Ala Phe Glu Ser Tyr Leu Pro 
275 280 285 



Asp Asn He Gin Ala Glu Leu Asp Ala Val Phe Glu Pro Thr Asp Phe 
290 295 300 



Val Gly Ser Asn He Tyr His Gly Arg Thr Val Arg Ala Lys Gin Asp 
305 310 315 320 



Gly Gly Phe Glu Phe He Asp Leu Pro Pro Gly Ser Pro Arg Thr Thr 
325 330 335 



Met Gly Trp Asp He Thr Pro Asp He Leu Tyr Trp Gly Gly Lys Tyr 
340 345 350 



Leu Tyr Glu Arg Tyr Gly Lys Pro Met Phe He Thr Glu Asn Gly He 
355 360 365 



Ala Val Pro Glu Leu Val Asn Asp Glu Gly Gin Val Glu Asp Thr Val 
370 375 380 



Arg Glu Gin Tyr Met Lys Leu His Leu Arg Gly Leu Gin Arg Ala Arg 
385 390 395 400 
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Asp Glu Gly He Pro Tyr Ala Gly Tyr Phe His Trp Ser Leu Leu Asp 
405 410 415 

Asn Phe Glu Trp Glu Gin Gly Tyr Ser Gin Arg Phe Gly Met Val Tyr 
420 425 430 

val Asp Tyr Gin Thr Gin Glu Arg He Leu Lys Arg Ser Gly Gin His 
435 440. 445 

Phe Ala Ala He Val Arg Glu He Thr Gly Thr Ala 
450 455 460 

<210> 39 
<211> 1521 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environnvental sample 



<400> 39 



gtgctcgccc 


ataaccgctc 


gcaccgtgaa 


gaactcctca 


atcgccggcc 


ggttgaattc 


60 


atcagcgccc 


tggaggcccg 


gggcgagctc 


cagcgcatca 


ccgccgaggt 


ggacccctac 


120 


ctcgagatca 


ccgagatctg 


cgatcgcacc 


ctgcgcgccg 


gcggcccggc 


gctgctgttc 


180 


gagaacgtca 


aggggcacga 


catgcctctg 


ctcggcaacc 


tcttcggcac 


gccgaagcgg 


240 


gttgccctcg 


gcatgggcca 


ggactccgtg 


gccgccctgc 


gcgaagtggg 


cgagctgctc 


300 


gccttcctca 


aggagccgga 


gcctcccaag 


ggctttcgcg 


acgcctggga 


caagctgccg 


360 


atcttcaagc 


aggtgatgag 


catggggccg 


aagaaggtcc 


gctcggcgcc 


ggtgcaggaa 


420 


aaggtgtacg 


agggcgacga 


ggtcgacctc 


gaccgcctgc 


cgatccagca 


ctgctggccc 


480 


ggcgacgccg 


cgcccctggt 


cacctggccg 


ctggtgatca 


cccgcgggcc 


ccacaagaag 


540 


cgccagaacc 


tcggcatcta 


ccgccagcag 


aagctgtcga 


agaaccggct 


gatcatgcgc 


600 


tggctctccc 


accgcggcgg 


ggcgctggac 


ttcctggagt 


tccagaaggc 


ccaccccggc 


660 


gagcccttcc 


cggtggcggt 


ggcgctgggc 


gccgacccgg 


cgaccatcct 


cggcgcggtg 


720 


accccggtgc 


cggattcgct 


ctccgagtac 


gccttcgccg 


ggctgctgcg 


cggctcgcgc 


780 


accgagctgg 


tcaagtgcgg 


ccacgccgac 


ctggacgtgc 


cggcctcggc 


ggagatcatc 


840 


ctggaggggt 


tcatctaccc 


ggatgacatg 


gcccccgagg 


gcGcctacgg 


cgaccatacc 


900 


ggctactaca 


acgaggtgga 


taccttcccg 


gtcttcacgg 


tgacgcgtat 


gaccatgcgc 


960 


cgcgatgcca 


tctatcactc 


cacctacacc 


ggccggccgc 


ccgacgagcc 


ggcgatcctt 


1020 


srggctggcgc 


tcaacgaggt 


gttcgtgccg 


atcctgcgcc 


gccagttccc 


ggagatcgtc 


1080 



wo 2006/101584 



69/248 



PCT/US2006/002516 



gacttctacc tgccgccgga gggctgctcc taccgcatgg cggtggtgac catgaagaag 1140 

cagtacccgg gccacgccaa gcgggtgatg atgggcgtgt ggagcttcct gcgccagttc 1200 

atgtacacca agttcgtggt ggtgctcgac gacgacgtca gcgcccggga ctgggaggac 1260 

gtgatctggg ccatcaccac ccgcatggac ccggcccggg acaccgtggt ggtggagaac 1320 

acccccatcg actacctgga cttcgcctcg ccggtctccg gcctcggttc caagatgggc 1380 

ctggatgcca ccagcaagtg gcccggcgag accgaccgcg agtggggggt gcccatcgtc 1440 

atggacgagg ccgtcaaggc ccgcgtcagc gagcgctgga acgagctggg catcgagctc 1500 

cccgacaaca cgaccccctg a 1521 

<210> 40 

<211> 506 

<212> PRT 

<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (21) . . . (445) 

<223> 3-octaprenyl-4-hydroxybenzoate carboxy-lyase 
<220> 

<221> SITE 
<222> (5) . . . (8) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 40 

Met Leu Ala His Asn Arg Ser His Arg Glu Glu Leu Leu Asn Arg Arg 
1 5 * 10 15 



Pro Val Glu Phe lie Ser Ala Leu Glu Ala Arg Gly Glu Leu Gin Arg 
20 25 30 



He Thr Ala Glu Val Asp Pro Tyr Leu Glu He Thr Glu He Cys Asp 
35 40 45 



Arg Thr Leu Arg Ala Gly Gly Pro Ala Leu Leu Phe Glu Asn Val Lys 
50 55 60 



Gly His Asp Met Pro Leu Leu Gly Asn Leu Phe Gly Thr Pro Lys Arg 
65 70 75 80 



Val Ala Leu Gly Met Gly Gin Asp Ser Val Ala Ala Leu Arg Glu Val 
85 90 95 



Gly Glu Leu Leu Ala Phe Leu Lys Glu Pro Glu Pro Pro Lys Gly Phe 
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100 105 110 



Arg Asp Ala Trp Asp Lys Leu Pro lie Phe Lys Gin Val Met Ser Met 
115 120 125 



Gly Pro Lys Lys Val Arg Ser Ala Pro Val Gin Glu Lys Val Tyr Glu 
130 135 140 



Gly Asp Glu Val Asp Leu Asp Arg Leu Pro He Gin His Cys Trp Pro 
145 150 155 160 



Gly Asp Ala Ala Pro Leu Val Thr Trp Pro Leu Val He Thr Arg Gly 
165 170 175 



Pro His Lys Lys Arg Gin Asn Leu Gly He Tyr Arg Gin Gin Lys Leu 
180 185 190 



Ser Lys Asn Arg Leu He Met Arg Trp Leu Ser His Arg Gly Gly Ala 
195 200 205 



Leu Asp Phe Leu Glu Phe Gin Lys Ala His Pro Gly Glu Pro Phe Pro 
210 215 220 



Val Ala Val Ala Leu Gly Ala Asp Pro Ala Thr He Leu Gly Ala Val 
225 230 235 240 

Thr Pro Val Pro Asp Ser Leu Ser Glu Tyr Ala Phe Ala Gly Leu Leu 

245 250 255 



Arg Gly Ser Arg Thr Glu Leu Val Lys Cys Gly His Ala Asp Leu Asp 
260 265 270 



Val Pro Ala Ser Ala Glu He He Leu Glu Gly Phe He Tyr Pro Asp 
275 280 285 



Asp Met Ala Pro Glu Gly Pro Tyr Gly Asp His Thr Gly Tyr Tyr Asn 
290 295 300 



Glu Val Asp Thr Phe Pro Val Phe Thr Val Thr fyrg Met Thr Met Arg 
305 310 . 315 320 



Arg Asp Ala He Tyr His Ser Thr Tyr Thr Gly Arg Pro Pro Asp Glu 
325 330 335 



Pro Ala He Leu Gly Leu Ala Leu Asn Glu Val Phe Val Pro He Leu 
340 345 350 
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Arg Arg Gin Phe Pro Glu lie Val Asp Phe Tyr Leu Pro Pro Glu Gly 
355 360 365 



Cys Ser Tyr Arg Met Ala Val Val Thr Met Lys Lys Gin Tyr Pro Gly 
370 375 380 



His Ala Lys Arg Val Met Met Gly Val Trp Ser Phe Leu Arg Gin Phe 
385 390 395 400 



Met Tyr Thr Lys Phe Val Val Val Leu Asp Asp Asp Val Ser Ala Arg 
405 410 415 



Asp Trp Glu Asp Val He Trp Ala He Thr Thr Arg Met Asp Pro Ala 
420 425 430 



Arg Asp Thr Val Val Val Glu Asn Thr Pro He Asp Tyr Leu Asp Phe 
435 440 445 



Ala Ser Pro Val Ser Gly Leu Gly Ser Lys Met Gly Leu Asp Ala Thr 
450 455 460 



Ser Lys Trp Pro Gly Glu Thr Asp Arg Glu Trp Gly Val Pro He Val 
465 470 475 480 



Met Asp Glu Ala Val Lys Ala Arg Val Ser Glu Arg Trp Asn Glu Leu 
485 490 495 



Gly He Glu Leu Pro Asp Asn Thr Thr Pro 
500 505 



<210> 41 
<211> 1410 
<212> DNA 
<213> Unknovm 

<220> 

<223> Obtained from environmental sample 
<400> 41 

atgaagacgc cttcgatcta cgataccatg acgcggtcgg tgcagccgtt gacacccgcc 60 
gacggcgaca ccttccgctt ttattgctgc ggccccaccg tctacgggcc ggcgcatgtc 120 
ggcaatttcc gcaccttcat cattcaggac gtgctgcgac gcgttatcga agggtcgggc 180 
ctcaaaacga gacacgtacg caacatcacc gatgtggacg acaaaaccat ccgccaatcg 240 
caagcggaag gaaaatctct gaaaatcttc acagggtact ggctggaacg gttccacgcc 300 
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gattgcgacg 


cgctgaatct 


gctgcgcccg 


cacgtcgagc 


ccggcgccgt 


tgaccatatc 


360 


ccggcgcaaa 


tccggatgat 


cgaacaactg 


atcgaaaaag 


gccacgccta 


cgtggcggac 


420 


gacaactcgg 


tctattatcg 


cgttgcttcg 


ttcgaagcgt 


acggccggtt 


gtcacgcctg 


480 


caagaacgac 


acatcaccac 


cggctgcgcc 


gaacacgcgc 


ataccgacga 


tgaatacgag 


540 


cgcgaatcgg 


ccgccgactt 


cgccttgtgg 


aaagcgcata 


aatccgagga 


cggcccgaac 


600 


gcgtggccga 


gcccgtgggg 


cgacggacga 


cccggctggc 


acatcgagtg 


cagcgccatg 


660 


tccgtcgagt 


atctgggcga 


gacattcgat 


ctgcacggcg 


gcggcgtgga 


cctgaccttc 


720 


ccccaccacg 


aaaacgaaat 


cgcgcaaagc 


gaagccgcca 


ccggcaagcc 


cttcgcgcgt 


780 


atctggttcc 


attccgcgca 


tctcatggtc 


gaaggccaca 


agatgtccaa 


gagcctcggc 


840 


aacctgttta 


cgctcgacga 


tatccgcgcg 


cgcggattcg 


acgccatgac 


cctgcgctat 


900 


gtcctgcttt 


cgggcaatta 


ccgccaaccc 


ctcaatttca 


cgtgggactc 


ccttaacgcc 


960 


gcgcaaagcg 


ccttacgccg 


cctgcgtcag 


ctcaaccacg 


atctccagca 


ggcggcgggc 


1020 


aagacggtcg 


cgcccgctga 


tacttcgtgg 


gggccgttcg 


aaccggtgta 


cgacgcgctt . 


1080 


gccgacaacc 


tgaacacgcc 


cgacgccctc 


ggccgcttat 


tctccgccct 


gcacagcatc 


1140 


gagcgcgcgc 


ttaacggcaa 


ggaaaggacg 


gccgaagagg 


ccgccctcgc 


ccgtgcgcag 


1200 


ttcctgcggg 


tcatggacct 


tttcggtttc 


agcctggacg 


cgccgccgac 


cgccgaagcg 


1260 


cccgaagaag 


tgcgtgcgct 


ggcgcagcag 


cgatgggacg 


ctaaacaagc 


gcgcgatttc 


1320 


gtccgcgccg 


acgccttgcg 


caaacaggtc 


accgacctcg 


gctggaccat 


ccgcgacgcc 


1380 


aaagacggct 


acgaactcgt 


ccaagagtaa 








1410 



<210> 42 
<211> 469 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (16)... (323) 

<223> tRNA synthetases class I (C) catalytic domain 
<220> 

<221> SITE 

<222> (69) . . . (72) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (316) . . . (319) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
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<400> 42 

Met Lys Thr Pro Ser lie Tyr Asp Thr Met Thr Arg Ser Val Gin Pro 
1 5 10 15 



Leu Thr Pro Ala Asp Gly Asp Thr Phe Arg Phe Tyr Cys Cys Gly Pro 
20 25 30 



Thr Val Tyr Gly Pro Ala His Val Gly Asn Phe Arg Thr Phe lie He 
35 40 45 



Gin Asp Val Leu Arg Arg Val He Glu Gly Ser Gly Leu Lys Thr Arg 
50 55 60 



His Val Arg Asn He Thr Asp Val Asp Asp Lys Thr He Arg Gin Ser 
65 70 75 80 



Gin Ala Glu Gly Lys Ser Leu Lys He Phe Thr Gly Tyr Trp Leu Glu 
85 90 95 



Arg Phe His Ala Asp Cys Asp Ala Leu Asn Leu Leu Arg Pro His Val 
100 105 110 



Glu Pro Gly Ala Val Asp His He Pro Ala Gin He Arg Met He Glu 
115 120 125 



Gin Leu He Glu Lys Gly His Ala Tyr Val Ala Asp Asp Asn Ser Val 
130 135 140 



Tyr Tyr Arg Val Ala Ser Phe Glu Ala Tyr Gly Arg Leu Ser Arg Leu 
145 150 155 160 



Gin Glu Arg His He Thr Thr Gly Cys Ala Glu His Ala His Thr Asp 
165 170 175 



Asp Glu Tyr Glu Arg Glu Ser Ala Ala Asp Phe Ala Leu Trp Lys Ala 
180 185 190 



His Lys Ser Glu Asp Gly Pro Asn Ala Trp Pro Ser Pro Trp Gly Asp 
155 200, 205 



Gly Arg Pro Gly Trp His He Glu Cys Ser Ala Met Ser Val Glu Tyr 
210 215 220 



Leu Gly Glu Thr Phe Asp Leu His Gly Gly Gly Val Asp Leu Thr Phe 
225 230 235 240 
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Pro His His Glu Asn Glu He Ala Gin Ser Glu Ala Ala Thr Gly Lys 
245 250 255 



Pro Phe Ala Arg He Trp Phe His Ser Ala His Leu Met Val Glu Gly 
260 265 270 



His liys Met Ser Lys Ser Leu Gly Asn Leu Phe Thr Leu Asp Asp He 
275 280 285 



Arg Ala Arg Gly Phe Asp Ala Met Thr Leu Arg Tyr Val Leu Leu Ser 
290 295 300 



Gly Asn Tyr Arg Gin Pro Leu Asn Phe Thr Trp Asp Ser Leu Asn Ala 
305 310 315 320 



Ala Gin Ser Ala Leu Arg Arg Leu Arg Gin Leu Asn His Asp Leu Gin 
325 330 335 



Gin Ala Ala Gly Lys Thr Val Ala Pro Ala Asp Thr Ser Trp Gly Pro 
340 345 350 



Phe Glu Pro Val Tyr Asp Ala Leu Ala Asp Asn Leu Asn Thr Pro Asp 
355 360 365 



Ala Leu Gly Arg Leu Phe Ser Ala Leu His Ser He Glu Arg Ala Leu 
370 375 380 



Asn Gly Lys Glu Arg Thr Ala Glu Glu Ala Ala Leu Ala Arg Ala Gin 
385 390 395 400 



Phe Leu Arg Val Met Asp Leu Phe Gly Phe Ser Leu Asp Ala Pro Pro 
405 410 415 



Thr Ala Glu Ala Pro Glu Glu Val Arg Ala Leu Ala Gin Gin Arg Trp 
420 425 430 



Asp Ala Lys Gin Ala Arg Asp Phe Val Arg Ala Asp Ala Leu Arg Lys 
435 440 445 



Gin Val Thr Asp Leu Gly Trp Thr He Arg Asp Ala Lys Asp Gly Tyr 
450 455 460 



Glu Leu Val Gin Glu 
465 



<210> 43 
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<211> 984 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 43 



atgacgactg 


aaaccaaatc 


caaactgtac 


ttgcataaag 


tgaacggcca 


gaaaggactg 


60 


gacctgcgcc 


agacctatca 


gcgcgacttc 


accgtgaccg 


aggcgtatcg 


cgatacgctg 


120 


ccggatatgc 


agaacgcttc 


cgaggcgttg 


cagggggcca 


atgtcgccat 


ccagaaagtc 


180 


ggcgtatcca 


atttcaagct 


gccactcaag 


taccgcaccc 


acacgggcga 


accgaccacg 


240 


ctggaaacca 


gcgtaaccgg 


cagcgtatcc 


ctgaagccgg 


gcctgaaggg 


catcaacatg 


300 


tcccgcgtca 


tgcggacctt 


ctacgacttc 


caggacgacg 


tgttcacgct 


cgacacgctg 


360 


gcccgtatac 


tggaagcgta 


caaacgggat 


gtcgacagca 


acgacgcaca 


tcttcggctg 


420 


agtttctcct 


acccgctgct 


tcaaaaaagt 


ctgcgcagcg 


aattattcgg 


ctggcaatat 


480 


taccaggtcg 


cattcgaggg 


acggatcgat 


gccgaaaatc 


gagtccgcac 


gttcattcat 


540 


tttgacttcg 


tgtattcctc 


cgcctgtccc 


tgttcggctg 


aactggccga 


acacgcgcgg 


600 


gaagtgcgcg 


gcctatacag 


catcccccac 


tcgcaacgca 


gcaaggcgcg 


cgtcttcgtg 


660 


gaagttcagc 


ccggcgccga 


actcaccatc 


gaagacgtgc 


acatgcactg 


cctgaacgcg 


720 


ctccaaacgg 


aaacgcaagt 


gatggtcaaa 


cgcgaagacg agcaggcgtt 


cgctgaaatg 


780 


aacggcgccg 


ccatcaaatt 


cgtcgaagac 


gccgcccgtc 


tgatctatga 


gcagttcgac 


840 


caggatccgc 


gcatcaagga 


tttcgaaatc 


gcctgcgcgc 


atctggaatc 


cttgcactcg 


900 


cacgacgccg 


tatcggtcat 


cgccaaaggc 


gtgcccggcg 


gcttccgcgc 


cgacttctcg 


960 


gacttcaaga 


gtctgatctg 


ctaa 








984 



<210> 44 
<211> 327 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 

<222> (39) . . . (308) 

<223> Uncharacterized ACR, COG1469 

<220> 

<221> SITE 

<222> (45) . . . (48) 

<223> N-glycosylation site. Prosite id ~ PSOOOOl 
<220> 
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<221> SITE 

<222> (100) . . . (103) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 44 

Met Thr Thr Glu Thr Lys Ser Lys Leu Tyr Leu His Lys Val Asn Gly 
15 10 15 



Gin Lys Gly Leu Asp Leu Arg Gin Thr Tyr Gin Arg Asp Phe Thr Val 
20 25 30 



Thr Glu Ala Tyr Arg Asp Thr Leu Pro Asp Met Gin Asn Ala Ser Glu 
35 40 45 



Ala Leu Gin Gly Ala Asn Val Ala lie Gin Lys Val Gly Val Ser Asn 
50 55 60 



Phe Lys Leu Pro Leu Lys Tyr Arg Thr His Thr Gly Glu Pro Thr Thr 
65 70 75 80 



Leu Glu Thr Ser Val Thr Gly Ser Val Ser Leu Lys Pro Gly Leu Lys 
85 90 95 



Gly lie Asn Met Ser Arg Val Met Arg Thr Phe Tyr Asp Phe Gin Asp 
100 105 110 



Asp Val Phe Thr Leu Asp Thr Leu Ala Arg He Leu Glu Ala Tyr Lys 
115 120 125 



Arg Asp Val Asp Ser Asn Asp Ala His Leu Arg Leu Ser Phe Ser Tyr 
130 135 140 



Pro Leu Leu Gin Lys Ser Leu Arg Ser Glu Leu Phe Gly Trp Gin Tyr 
145 150 155 160 



Tyr Gin Val Ala Phe Glu Gly Arg He Asp Ala Glu Asn Arg Val Arg 
165 170 175 



Thr Phe He His Phe Asp Phe Val Tyr Ser Ser Ala Cys Pro Cys Ser 
180 185 190 



Ala Glu Leu Ala Glu His Ala Arg Glu Val Arg Gly Leu Tyr Ser He 
195 200 205 



Pro His Ser Gin Arg Ser Lys Ala Arg Val Phe Val Glu Val Gin Pro 
210 215 220 
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Gly Ala Glu Leu Thr He Glu Asp Val His Met His Cys Leu Asn Ala 
225 230 235 240 



Leu Gin Thr Glu Thr Gin Val Met Val Lys Arg Glu Asp Glu Gin Ala 
245 250 255 



Phe Ala Glu Met Asn Gly Ala Ala He Lys Phe Val Glu Asp Ala Ala 
260 265 270 



Arg Leu He Tyr Glu Gin Phe Asp Gin Asp Pro Arg He Lys Asp Phe 
275 280 285 



Glu He Ala Cys Ala His Leu Glu Ser Leu His Ser His Asp Ala Val 
290 295 , 300 



Ser Val He Ala Lys Gly Val Pro Gly Gly Phe Arg Ala Asp Phe Ser 
305 310 315 320 



Asp Phe Lys Ser Leu He Cys 
325 



<210> 45 
<211> 1377 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 45 

atgacacaac tggcttttcc atctaacttc atctggggaa cagctacttc cgcttaccaa 60 

atcgaaggcg cctggaacgc agacggcaag ggcgaatcta tttgggatcg cttttcccat 120 

acgcagggga agatcattga cggcagcaac ggcgatgtgg cctgcgatca ctaccaccgc 180 

tggcgcgagg acgtggccct catgagagac ttgggtatgc aggcatatcg cttctccatc 240 

tcctggccac gcatcctgcc caccggtcat ggacagatca atcaggctgg gctggacttt 300 

tacaatcgcc tggtggacgg gttgctggaa gctggcatca agccctttgc caccctctac 360 

cactgggacc tgccgctggc gctacaggct gacggcggct ggccggagcg ctccacggcc 420 

aaggcctttg tcgaatacgc cgacgtggtc agccgcgcgc tgggcgatcg ggtgaagagc 480 

tggatcaccc ataacgaacc gtggtgcatc agcatgctga gccatcaaat tggggagcat 540 

gcgcccggct ggcgggactg gcaggctgcg ttggcggccg cgcaccacgt cctcctttcg 600 

catggttggg ccgtgccgga actgcgtcgc aacagccgcg atgcagaaat cggcatcacg 660 

ttgaacttta ccccggcgga gccagcttcg aacagcgcag ccgatttcaa ggcctatcgc 720 
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cagttcgatg 


gctacttcaa 


ccgctggttc 


ctggacccgc 


tctatggccg 


ccactatccg 


780 


gcagatatgg 


tgcacgatta 


catcgcgcaa 


ggctacctgc 


catcacaggg 


tttgactttc 


840 


gtggaagctg 


gtgacctgga 


cgcgatcgcg 


acgcgcaccg 


atttcctggg 


tgtgaactat 


900 


tacacgcgcg 


aagtggtccg 


tagccaggaa 


atcccagaga 


gtgagaacgc 


gccgcgcaca 


960 


gtcttgcgcg 


cgccacagga 


agagtggaca 


gagatgggct 


gggaagtgta 


tcctgagggc 


1020 


ctctacaggt 


tgctcaatcg 


gttgcacttt 


gaataccagc 


cgcgcaagct 


ctacgtgacc 


1080 


gagagcggtt 


gcagctactc 


cgatggaccc 


ggccccaacg 


gtcggatacc 


ggaccaacgc 


1140 


cgtatcaact 


acctgcgcga 


tcacttcgca 


gcggcgcatc 


aggcgataca 


atgcggcgtc 


1200 


ccgctggccg 


gctacttcgt 


ctggtcgttc 


atggacaact 


tcgagtgggc 


caaagggtac 


1260 


acccaacgtt 


ttggtatcgt 


atgggtggat 


tatcaatcgc 


aacgacggat 


accgaaagac 


1320 


agcgcctact 


ggtatcgcga 


tgtcgtcgcc 


gccaacgcgg 


tgcaagttcc 


tgattag 


1377 



<210> 46 
<211> 458 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (2).,. (454) 

<223> Glycosyl hydrolase family 1 
<220> 

<221> SITE 

<222> (10) . . . (24) 

<223> Glycosyl hydrolases family 1 N- terminal signature. Prosite id = 
PS00653 

<400> 46 

Met Thr Gin Leu Ala Phe Pro Ser Asn Phe He Tzp Gly Thr Ala Thr 
1-5 10 15 

Ser Ala Tyr Gin He Glu Gly Ala Trp Asn Ala Asp Gly Lys Gly Glu 
20 25 30 

Ser He Trp Asp Arg Phe Ser His Thr Gin Gly Lys He He Asp Gly 
35 40 45 

Ser Asn Gly Asp Val Ala Cys Asp His Tyr His Arg Tarp Arg Glu Asp 
50 55 60 



Val Ala Leu Met Arg Asp Leu Gly Met Gin Ala Tyr Arg Phe Ser' He 
65 70 75 80 
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Ser Trp Pro Arg lie Leu Pro Thr Gly His Gly ,Gln lie Asn Gin Ala 
85 90 95 



Gly Leu Asp Phe Tyr Asn Arg Leu Val Asp Gly Leu Leu Glu Ala Gly 
100 105 110 



lie Lys Pro Phe Ala Thr Leu Tyr His Trp Asp Leu Pro Leu Ala Leu 
115 120 125 



Gin Ala Asp Gly Gly Trp Pro Glu Arg Ser Thr Ala Lys Ala Phe Val 
130 135 140 



Glu Tyr Ala Asp Val Val Ser Arg Ala Leu Gly Asp Arg Val Lys Ser 
145 150 155 160 



Trp lie Thr His Asn Glu Pro Trp Cys lie Ser Met Leu Ser His Gin 
165 170 175 



He Gly Glu His Ala Pro Gly Trp Arg Asp Trp Gin Ala Ala Leu Ala 
180 185 190 



Ala Ala His His Val Leu Leu Ser His Gly Trp Ala Val Pro Glu Leu 
195 200 205 



Arg Arg Asn Ser Arg Asp Ala Glu He Gly He Thr Leu Asn Phe Thr 
210 215 220 



Pro Ala Glu Pro Ala Ser Asn Ser Ala Ala Asp Phe Lys Ala Tyr Arg 
225 230 235 240 



Gin Phe Asp Gly Tyr Phe Asn Arg Trp Phe Leu Asp Pro Leu Tyr Gly 
245 250 255 



Arg His Tyr Pro Ala Asp Met Val His Asp Tyr He Ala Gin Gly Tyr 
260 265 270 



Leu Pro Ser Gin Gly Leu Thr Phe Val Glu Ala Gly Asp Leu Asp Ala 
275 280 285 



He Ala Thr Arg Thr Asp Phe Leu Gly Val Asn Tyr Tyr Thr Arg Glu 
290 295 300 



Val Val Arg Ser Gin Glu He Pro Glu Ser Glu Asn Ala Pro Arg Thr 
305 310 315 320 
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Val Leu Arg Ala Pro Gin Glu Glu Trp Thr Glu Met Gly Trp Glu Val 
325 330 335 

Tyr Pro Glu Gly Leu Tyr Arg Leu Leu Asn Arg Leu His Phe Glu Tyr 
340 345 350 

Gin Pro Arg Lys Leu Tyr Val Thr Glu Ser Gly Cys Ser Tyr Ser Asp 
355 360 365 

Gly Pro Gly Pro Asn Gly Arg lie Pro Asp Gin Arg Arg lie Asn Tyr 
370 375 380 

Leu Arg Asp His Phe Ala Ala Ala His Gin Ala He Gin Cys Gly Val 
385 390 395 400 

Pro Leu Ala Gly Tyr Phe Val Trp Ser Phe Met Asp Asn Phe Glu Trp 
405 410 415 

Ala Lys Gly Tyr Thr Gin Arg Phe Gly He Val Trp Val Asp Tyr Gin 
420 425 430 

Ser Gin Arg Arg He Pro Lys Asp Ser Ala Tyr Trp Tyr Arg Asp Val 
435 440 445 

Val Ala Ala Asn Ala Val Gin Val Pro Asp 
450 455 

<210> 47 
<211> 1353 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 47 



atgaaaaaat 


acctttttcc 


tgaaaatttt 


ttatggggtg ctgccacagc ttcgtatcaa 


60 


atcgaaggtt 


ctccctctgc 


tgatggcaaa 


ggtgaatcga tatgggaccg tttttctcac 


120 


acaccgggga 


acatttggaa 


cgctgaaacc 


gsggatatcg cctgcgatca ttaccggcgt 


180 


tacgtggatg 


atgtaaagct 


gatttcacaa 


atcgggctta acgcgtaccg tttttcaatt 


240 


tcctggccca 


gggtatttcc 


agaggggaga 


ggaaaagcaa atgaaaaagg actcgatttt 


300 


taccgcaggt 


tgattgaaca 


gctgcagcaa 


catcgaatca aaacggcagt gacactttac 


360 


cactgggatc 


ttccacaagt 


tctgcaggat 


cgcggcgggt gggcaaaccg tgatacggcg 


420 


aagtattttt 


ctgagtatgc 


cacctttctc 


tttgaaaaac tcgatctccc cgttgacatg 


480 
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tggattactc 


ttaacgaacc 


stgggttatc 


gctattctgg ggcatgcttt 


tggtatccac 


540 


gctccaggga 


tgagtgactt 


cagcacagcc 


ctccaggtct 


cgcataacct 


gcttctgggg 


600 


cacgggttgg 


cggttaaagc 


atttcgggag 


ccuaagaggg 


gtgatgaacc 


ggtaggtatt 


660 


acccttaacc 


ttgccccggt 


tgaaccgctg 


accgaaaagc 


ccgccgatct 


aaaggcagct 


720 


ttactttctg 


acggttttat 


gaaccgctgg 


taccttgatc 


ccctgttcaa 


aggtggttac 


780 


cctgaagata 


tgatggatat 


ctattcccgg 


aac c c ugaac 


tgcccaaaat 


tgaaaagggg 


840 


gatgctcagg 


ttattgccga 


accgatcgac 


ttcctgggca 


taaataacta 


taccagggtt 


900 


ctcgtggaag 


ccagcggtga 


tgaaaatgcc 


tttatgggca 


accctgtcaa 


cccccagggc 


960 


tctgaatata 


ctgaaatggg 


ttgggaggtt 


tatccgcagg 


gtctctacga 


cctgctgacc 


1020 


agggttcacc 


gggattacgg 


gccaatgccg 


ctatatataa 


ctgaaaacgg 


ggcagccttt 


1080 


cccgatgaac 


ttgacagcaa 


tgggcagata 


gatgatccaa 


ggcggataaa 


ttacctggaa 


1140 


acttatcttc 


atcagtgctg 


gaaggcagtt 


caggacggtg 


tgcctctaaa 


aggctatttt 


1200 


gtctggaccc 


tgatggataa 


cttcgagtgg 


gctttcggtt 


tcagcaagcg atttgggctc 


1260 


atatacgtag 


attaccagga 


tcagaaacgt 


tacttgaaaa 


acagcgccta 


ctggtatagc 


1320 


aaggttattg 


ggcgaaacgg 


cctcgagcta 


taa 






1353 



<210> 48 
<211> 450 
<212> PRT 
<213> Un3aiown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMA.IN 
<222> (4) . . . (448) 

<223> Glycosyl hydrolase family 1 
<220> 

<221> SITE 

<222> (10) . , , (24) 

<223> Glycosyl hydrolases family 1 N- terminal signature. Prosit e id = 
PS00653 

<220> 

<221> SITE 

<222> (300) , . . (303) 

<223> N-glycosylation site. Prosite id = PSOOOOl 

<220> . 

<221> SITE 

<222> (356) . . . (364) 

<223> Glycosyl hydrolases family 1 active site. Prosite id = PS00572 
<400> 48 

Met Lys Lys Tyr Leu Phe Pro Glu Asn Phe Leu Trp Gly Ala Ala Thr 
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10 15 



Ala Ser Tyr Gin lie Glu Gly Ser Pro Ser Ala Asp Gly Lys Gly Glu 
20 25 30 



Ser lie Trp Asp Arg Phe Ser His Thr Pro Gly Asn lie Trp Asn Ala 
35 40 45 



Glu Thr Gly Asp He Ala Cys Asp His Tyr Arg Arg Tyr Val Asp Asp 
50 55 60 



Val Lys Leu He Ser Gin He Gly Leu Asn Ala Tyr Arg Phe Ser He 
65 70 75 80 



Ser Trp Pro Arg Val Phe Pro Glu Gly Arg Gly Lys Ala Asn Glu Lys 
85 90 95 



Gly Leu Asp Phe Tyr Arg Arg Leu He Glu Gin Leu Gin Gin His Arg 
100 105 110 



He Lys Thr Ala Val Thr Leu Tyr His Trp Asp Leu Pro Gin Val Leu 
115 120 125 



Gin Asp Arg Gly Gly Trp Ala Asn Arg Asp Thr Ala Lys Tyr Phe Ser 
130 135 140 



Glu Tyr Ala Thr Phe Leu Phe Glu Lys Leu Asp Leu Pro Val Asp Met 

150 155 160 



Trp He Thr Leu Asn Glu Pro Trp Val He Ala He Leu Gly His Ala 
165 170 175 



Phe Gly He His Ala Pro Gly Met Ser Asp Phe Ser Thr Ala Leu Gin 
180 185 190 



Val Ser His Asn Leu Leu Leu Gly His Gly Leu Ala Val Lys Ala Phe 
195 200 205 



Arg Glu Ser Lys Arg Gly Asp Glu Pro Val Gly He Thr Leu Asn Leu 
210 215 220 



Ala Pro Val Glu Pro Leu Thr Glu Lys Pro Ala Asp Leu Lys Ala Ala 
225 230 235 240 



Leu Leu Ser Asp Gly Phe Met Asn Arg Trp Tyr Leu Asp Pro Leu Phe 
245 250 255 
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Lys Gly Gly Tyr Pro Glu Asp Met Met Asp lie Tyr Ser Arg Asn Phe 
260 265 270 



Glu Leu Pro Lys He Glu hys Gly Asp Ala Gin Val He Ala Glu Pro 
275 280 285 



He Asp Phe Leu Gly He Asn Asn Tyr Thr Arg Val Leu Val Glu Ala 
290 295 300 



Ser Gly Asp Glu Asn Ala Phe Met Gly Asn Pro Val Asn Pro Gin Gly 
305 310 315 320 



Ser Glu Tyr Thr Glu Met Gly Trp Glu Val Tyr Pro Gin Gly Leu Tyr 
325 330 335 



Asp Leu Leu Thr Arg Val His Arg Asp Tyr Gly Pro Met Pro Leu Tyr 
340 345 350 



He Thr Glu Asn Gly Ala Ala Phe Pro Asp Glu Leu Asp Ser Asn Gly 
355 360 365 



Gin He Asp Asp Pro Arg Arg He Asn Tyr Leu Glu Thr Tyr Leu His 
370 375 380 



Gin Cys Trp Lys Ala Val Gin Asp Gly Val Pro Leu Lys Gly Tyr Phe 
385 390 395 400 



Val Trp Thr Leu Met Asp Asn Phe Glu Trp Ala Phe Gly Phe Ser Lys 
405 410 415 



Arg Phe Gly Leu He Tyr Val Asp Tyr Gin Asp Gin Lys Arg Tyr Leu 
420 425 430 



Lys Asn Ser Ala Tyr Trp Tyr Ser Lys Val He Gly Arg Asn Gly Leu 
435 440 445 



Glu Leu 
450 



<210> 49 
<211> 591 
<212> DNA 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
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<400> 49 



atggactttg 


agcgggcagt 


tgacaggaat atcattagat tacgctcttc gttaaaggaa 


60 


gaaatgaagg 


atctagttgc 


agttgaagct ccggtaacaa tatttttaaa tggcagcgag 


120 


ctggtaaccc 


tgctctgcac 


cccggagaaa attgatcgtt tggccctcgg tttccttcat 


180 


tcagaagggc 


tgcttaactc 


acttgatgat cttagtatga tcaggaccag ggagagcgaa 


240 


ggcctggttg 


aaattgaact 


taaagaggcc tcgccggcac ttgataaatt atacgggaag 


300 


aggacaatta 


cttccggttg 


cggtaaggga acaatttttt ttaatgttct cgattctctg 


360 


cgcagtaaac 


cactcgacgg 


aaagcttgtg attacaaccg aagagattca taaattaatg 


420 


gatgacctgc 


aggggcgggc 


ggaactgttc aaggctaccg ggggtgttca cagcgctgcg 


480 


cttgccgaca 


gaaaggaaat 


actctttttc agtgaagata tcggccgcca taatgctatc 


540 


gataaaattg 


tgggagagtg 


tttgctggag ggggtatctc ctgaagataa g 


591 



<210> 50 
<211> 197 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (17) . . . (195) 
<223> PdhD/NarQ family 

<220> 

<221> SITE 

<222> (37) . . . (40) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 50 

Met Asp Phe Glu Arg Ala Val Asp Arg Asn lie lie Arg Leu Arg Ser 
^5 10 15 

Ser Leu Lys Glu Glu Met Lys Asp Leu Val Ala Val Glu Ala Pro Val 
20 25 30 



Thr He Phe Leu Asn Gly Ser Glu Leu Val Thr Leu Leu Cys Thr Pro 
35 40 45 



Glu Lys He Asp Arg Leu Ala Leu Gly Phe Leu His Ser Glu Gly Leu 
50 55 60 



Asn Ser Leu Asp Asp Leu Ser Met He Arg Thr Arg Glu Ser Glu 
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Gly Leu Val Glu He Glu Leu Lys Glu Ala Ser Pro Ala Leu Asp Lys 
85 90 95 



Leu Tyr Gly Lys Arg Thr He Thr Ser Gly Cys Gly Lys Gly Thr He 
100 105 110 

Phe Phe Asn Val Leu Asp Ser Leu Arg Ser Lys Pro Leu Asp Gly Lys 
115 120 125 



Leu Val He Thr Thr Glu' Glu He His Lys Leu Met Asp Asp Leu Gin 
130 135 140 



Gly Arg Ala Glu Leu Phe Lys Ala Thr Gly Gly Val His Ser Ala Ala 
145 150 155 160 



Leu Ala Asp Arg Lys Glu He Leu Phe Phe Ser Glu Asp He Gly Arg 
165 170 175 



His Asn Ala He Asp Lys He Val Gly Glu Cys Leu Leu Glu Gly Val 
180 185 190 



Ser Pro Glu Asp Lys 
195 



<210> 51 
<211> 1014 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 51 

atgtccaggg gcatcctgat cctcgtcatg ctgtctgttc tgagcggcgc ggcgctggcc 60 

caaccggccg ggctgccgcc gcgttcgccg gtgcagcgct gcatcaacct gggcaatatg 120 

ctggaagcgc cggaggaggg ctggtggggg ctgcgcgtcg agcgcgacta cctgacgacg 180 

atcgccgggg ccgggttcga tgcggtgcgc atcccgataa gctggtcaac ccatgctgcc 240 

agcgagccgc cctacaccat cgatccggct ttcttcgccc gcgttgatga agtcgtcggc 300 

tgggcgctgg cggacgggct gaaggccatc atcaacgtgc accactacga ggagatgatg 360 

agcgatccgg cggggcattt cccccggctg cgcgcgctgt gggcgcagat cgcggagcac 420 

tacgccgact acccgcccgc gctgatgttc gagctgctca acgaaccgtt cgaggcgctg 480 

acgccgctgc ggtggaacga gtacgccgcc gatctgatcg cgctgatccg ccagaccaac 540 

ccggggcgca ccctgatcgt cggcgggggc tggtggaaca gtgtggaagg gctgatgcag 600 
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v-a y " a u t- 1- 


t-ycto(,,L.yuuy 


9 u y a, c y u u 


st't'^pt'appa 


L,<^v^y u L.cy ay 


o o vj 


1 1" pa PCTpa t" p 

*- vi. ^ v.^ C4. 




y '-'y y >-*>-cn-v^y 


y a.ciy \^y 


dv^v— «.y vy y 


ycio^y^^i^yy 


720 


gggacgggcg 


aggaacggct 


cgatctggag 


tccaatatcc 


gtattgcggc 


ggcctgggcg 


780 


gtgtacaacc 


ggcgcccgct 


gctgttgggc 


gaattcggcg 


tctatggccg 


ggtggccgat 


840 


ctcgattcgc 


gcctgcgctg 


gacgacggcg 


gtgcgcgccg 


aggccgaggc 


gcagggcatc 


900 


ggctggtgct 


actgggaatt 


cgccgccggc 


ttcggcattt 


acgacccgga 


aagccggacg 


960 


ttcaacccgc 


tgtaccgcgc 


gctgatcccg 


caggccgggc 


cggcgcgcGc 


ctga 


1014 



<210> 52 
<211> 337 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAL 
<222> (1) . , . (20) 

<220> 

<221> DOMAIN 
<222> (38) . . . (314) 

<223> Cellulase (glycosyl hydrolase family 5) 
<220> 

<221> SITE 

<222> (150) . , . (159) 

<223> Glycosyl hydrolases family 5 signature. Prosite id = PS00659 
<400> 52 

Met Ser Arg Gly He Leu He Leu Val Met Leu Ser Val Leu Ser Gly 
15 10 15 



Ala Ala Leu Ala Gin Pro Ala Gly Leu Pro Pro Arg Ser Pro Val Gin 
20 25 30 



Arg Cys He Asn Leu Gly Asn Met Leu Glu Ala Pro Glu Glu Gly Trp 
35 40 45 



Trp Gly Leu Arg Val Glu Arg Asp Tyx Leu Thr Thr He Ala Gly Ala 
50 55 60 



Gly Phe Asp Ala Val Arg He Pro He Ser Trp Ser Thr His Ala Ala 
65 70 75 80 



Ser Glu Pro Pro Tyf Thr He Asp Pro Ala Phe Phe Ala Arg Val Asp 
85 90 95 
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Glu Val Val Gly Trp Ala Leu Ala Asp Gly Leu Lys Ala He He Asn 
100 105 110 

Val His His Tyr Glu Glu Met Met Ser Asp Pro Ala Gly His Phe Pro 
115 120 125 

Arg Leu Arg Ala Leu Trp Ala Gin He Ala Glu His Tyr Ala Asp Tyr 
1?0 135 140 

Pro Pro Ala Leu Met Phe Glu Leu Leu Asn Glu Pro Phe Glu Ala Leu 
150 155 160 

Thr Pro Leu Arg Trp Asn Glu Tyr Ala Ala Asp Leu lie Ala Leu He 
165 170 175 

Arg . Gin Thr Asn Pro Gly Arg Thr Leu He Val Gly Gly Gly Trp Trp 
180 185 190 

Asn Ser Val Glu Gly Leu Met Gin Leu Arg Leu Pro Asp Asp Pro Asp 
195 200 205 

Leu Leu Ala Thr Phe His Tyr Tyr His Pro Phe Glu Phe Thr His Gin 
210 215 220 

Gly Ala Glu Trp Ser Pro Glu Val Thr Asp Leu Ser Gly He Ala Trp 
225 230 235 240 

Gly Thr Gly Glu Glu Arg Leu Asp Leu Glu Ser Asn He Arg He Ala 
245 250 255 

Ala Ala Trp Ala Val Tyr Asn Arg Arg Pro Leu Leu Leu Gly Glu Phe 
260 265 270 

Gly Val Tyr Gly Arg Val Ala Asp Leu Asp Ser Arg Leu Arg Trp Thr 
275 280 285 

Thr Ala Val Arg Ala Glu Ala Glu Ala Gin Gly He Gly Trp Cys TVr 
290 295 300 

Trp Glu Phe Ala Ala Gly Phe Gly He Tyr Asp Pro Glu Ser Arg Thr 
305 310 315 320 

Phe Asn Pro Leu Tyr Arg Ala Leu He Pro Gin Ala Gly Pro Ala Arg 
325 330 335 



Pro 
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<210> 53 
<211> 1377 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from Gnvironmental sample 
<400> 53 



a t crtcrcia t acr 


1 1 caagcgac 


atct" thaat-f 

^ v., <^ ^(ACL (a> L> 


a, d CL cLd. CL ^ CL 


atgtgcctgg cccacgctac 


D U 


accagttatc 






a /^-H a 4- 

dy ugagaa t. c 


tttcactaaa gcagtggcaa 


120 


caaacgctca 


aaaaatcctt 


t era t Cf a cr t p cr 

L^yd^^yciy u>v.^y 




**"-yycducag cccycacacc 




cattfcgccat 


^ ci.c4.ciy 




L Lci-y tggtt. 


gccataaacg tgtgactaaa 


240 


aacfcatgaga 


tcroaaaacrop 




y Ldt Ca.a. 


aagaatggga tttatattgc 




caacttttgg 


t~ csdPi t* 3 ;^ A r» 
^y y d. L^cid-Ci^w 


uy L.V'Ctl. uo.cl.cl 


s ^ f* ^ ^ s ^ ^ 

y dda, u XI. C; a. L. U 


tgggtggggg aactccgaca 


360 


ttttttagtc 


c tgaacattt 


cicx^^y ^C<y ^y 


d L. L.ddy y y y d 


tattggctaa agccgaagtt 




gcagatgagc 


atgagtttag 


fc 1 1 1 craacrcf a 


v^Ai-^^odd^d 


dudcydt-dcg tgaacacucg 


/ion 
4oU 


caagcgctct 


atgatgttgg 


at 1 1 cgacgt 




ujuy Cy t_dy y d Ct.dUdacgdd 


c: yi n 


actgtgcaaa 


aagccattca 


ccgcattcag 


CP t* 3 1" era a a 


dug L. uddddcL cgccdccgag 


bvJU 


tgggcgcgtg 


agattggcta 


tacctctatt 


t ccrcat"cra t" t' 
w ^ ^y ^ ^ 


uy y V- L. uy y v^^- i_y(.,cy l.UL. 


00 u 


caaagtttag 


acgatgtctt 


aaatacgatt 


rra i~r^a3a/^r«a 
y d i.t.rdddv^^d 


dudccuuaat gccy gaucgc 


"7 0 A 


ttggctttgt 


atagctatgc 


ccatgtgcca 


tggattaaag 


gcaatggtca acgcggtttt 


780 


aaagatgctg 


atgtcccgaa 


agacgagatt 


aaacgtcaat 


gttatgagga aggcaaaaaa 


840 


aaattattag 


aacatggcta 


tcatgaaatt 


ggtatggatc 


attttgctct agaacaagac 


900 


agtatgtatc 


agtcttttaa 


agcagggagc 


ttgcatcgta 


atttcatggg ttataccgca 


960 


tcgaaaacgc 


aagtgatgat 


tgggcttggg 


atttcatcaa 


ttagtgacag ttggtacagc 


1020 


tttgcgcaaa 


acgtgaaaac 


attagatgaa 


tattatacct 


tgctagaaaa aaatcagatt 


1080 


cccgtgttta 


aagggcatgt 


ctfcgaatcag 


gaagatttga 


tcatccgtaa acatatttta 


1140 


aatttgatgt 


gtggcttcca 


aacctcatgg 


gcaaatcccg 


atatgcaatt tcctgaaatt 


1200 


cagtctgttt 


tggcacaatt 


agcagaaatg 


cagcaagatg 


gtttgattca aattgaagac 


1260 


gcatcggtca 


cagttttaga 


agcgggcaag 


ccttttgttc 


gaaatatttg tatggccttt 


1320 


gatttaagac 


tcaagcgcaa 


caagcctgag 


aatcggattt 


tttcgatgac gatttaa 


1377 



<210> 54 
<211> 458 
<212> PRT 
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<213> Unknown 
<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 

<222> (59) . . . (233) 

<223> Radical SAM superfamily 

<220> 

<221> DOmiN 

<222> (316) . . . (431) 

<223> HemN C-terminal region 

<220> 

<221> SITE 

<222> (33) . . . (36) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (51) . - . (54) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (155) . . . (158) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (181) . . . (184) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (200) . . . (203) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 54 

Met Trp Met Val Gin Ala Thr Ser Leu lie Gin Lys Tyr Asn Val Pro 
15 10 15 



Gly Pro Arg Tyr Thr Ser Tyr Pro Thr Val Pro Tyr Trp Glu Ser Glu 
20 25 30 



Asn Phe Ser Leu Lys Gin Trp Gin Gin Thr Leu Lys Lys Ser Phe Asp 
35 40 45 



Glu Ser Asn Gin Ser Glu Gly lie Ser Leu Tyr lie His Leu Pro Phe 
50 55 60 



Cys Glu Ser Leu Cys Thr Phe Cys Gly Cys His Lys Arg Val Thr Lys 
65 70 75 80 
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Lys His Glu Met Glu hys Pro Tyr lie Gin Ala Val Leu Lys Glu Trp 
85 90 95 



Asp Leu Tyr Cys Gin Leu Leu Val Asp Lys Pro Val lie Lys Glu lie 
100 105 110 



His Leu Gly Gly Gly Thr Pro Thr Phe Phe Ser Pro Glu His Leu Thr 
115 120 125 



Gin Leu lie Lys Gly lie Leu Ala Lys Ala Glu Val Ala Asp Glu His 
130 135 140 



Glu Phe Ser Phe Glu Gly His Pro Asn Asn Thr Thr Arg Glu His Leu 
145 150 155 160 



Gin Ala Leu Tyr Asp Val Gly Phe Arg Arg Val Ser Tyr Gly Val Gin 
165 170 175 



Asp Tyr Asn Glu Thr Val Gin Lys Ala He His Arg He Gin Pro Tyr 
180 185 190 



Glu Asn Val Lys Asn Val Thr Glu Trp Ala Arg Glu He Gly Tyr Thr 
195 200 205 



Ser He Ser His Asp Leu Val Phe Gly Leu Pro Phe Gin Ser Leu Asp 
210 215 220 



Asp Val Leu Asn Thr He Asp Gin Thr Asn Thr Leu Met Pro Asp Arg 
225 230 235 240 



Leu Ala Leu Tyr Ser Tyr Ala His Val Pro Trp He Lys Gly Asn Gly 
245 250 255 



Gin Arg Gly Phe Lys Asp Ala Asp Val Pro Lys Asp Glu He Lys Arg 
260 265 270 



Gin Cys Tyr Glu Glu Gly Lys Lys Lys Leu Leu Glu His Gly Tyr His 
275 280 285 



Glu He Gly Met Asp His Phe Ala Leu Glu Gin Asp Ser Met Tyr Gin 
290 295 300 



Ser Phe Lys Ala Gly Ser Leu His Arg Asn Phe Met Gly Tyr Thr Ala 
305 310 315 320 



Ser Lys Thr Gin Val Met He Gly Leu Gly He Ser Ser He Ser Asp 
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325 330 335 



Ser Trp Tyr Ser Phe Ala Gin Asn Val Lys Thr Leu Asp Glu Tyr Tyr 
340 345 350 



Thr Leu Leu Glu Lys Asn Gin He Pro Val Phe Lys Gly His Val Leu 
355 360 365 



Asn Gin Glu Asp Leu He He Arg Lys His He Leu Asn Leu Met Cys 
370 375 380 



Gly Phe Gin Thr Ser Trp Ala Asn Pro Asp Met Gin Phe Pro Glu He 
385 390 395 400 



Gin Ser Val Leu Ala Gin Leu Ala Glu Met Gin Gin Asp Gly Leu He 
405 410 415 



Gin He Glu Asp Ala Ser Val Thr Val Leu Glu Ala Gly Lys Pro Phe 
420 425 430 



Val Arg Asn He Cys Met Ala Phe Asp Leu Arg Leu Lys Arg Asn Lys 
435 440 445 



Pro Glu Asn Arg He Phe Ser Met Thr He 
450 455 



<210> 55 
<211> 1389 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 55 



atigagcgctt 


cgagtccctc 


ccgccccctg 


tccttcccag agcagttcgt 


ctggggtgct 


60 


gccgcggcct 


cctaccaagt 


cgagggcgcc 


gtccacgagg acgggaaggg 


cccctccgtc 


120 


tgggacatgt 


tctgcgagaa 


gcccggagcg 


gtcttccagg ggcacgacgg ggcggtggct 


180 


tgcgaccact 


atcaccgcta 


ccgagaggac 


gtggcgttga tgcgacaggt 


gggcctgcac 


240 


gcctaccgcc 


tgagcgtgtg 


ctggccccga 


gtgctcccgg agggcgtcgg gcagcccaac 


300 


gagaagggcc 


tcgacttcta 


ctcgcggttg 


gtggacgcgc tgctcgaggc 


agggattacg 


360 


ccctgggtaa 


cgctttttca 


ttgggactac 


cccttggctc tctatcaccg 


ggggggctgg 


420 


ctcaaccggg 


atagcgcgga 


ttggtttgcc 


gagtacgcgg gcctaatcgc 


cgatcgcctc 


480 


tccgaccggg 


tgcagcattt 


cttcactcag 


aacgagcccc aggtctatat 


cggcttcgga 


540 
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cacctcgagg 


gtaagcatgc 


tccaggagac accttgccca tgtcccaggt gctgcttgcg 


600 


gggcatcata 


gcctactggc 


gcacggc.aag gccgtgcagg cgctccgcgc ccaggcgaag 


660 


cagcagctgc 


gcgtcggcta 


cgctcccgtc ggcatgcccc tccatccctt cacggactcg 


720 


gccgaggacg 


tggccgctgc 


gcggaaggcg accttttggg ttcgggagaa gaactcctgg 


780 


aacaacgcct 


ggtggatgga 


cccggtgttc ttgggtgagt acccggctca gggcctcgcc 


840 


ttcttcggcc 


gggacgtgcc 


gcaggtgcgc gagggagaca tgcagctcafc cgcgcagccc 


900 


ttggacttct 


ttggggtcaa 


catctaccag agcacccccg tgcgcgcgtc tagcgccgaa 


960 


agcggcttcg 


aggtcgtccc 


ccatccaacg ggctatccta tcactgcctt caactggccg 


1020 


atcacgcccc 


aggccctcta 


ctggggtccg cgcttcttct acgagcgcta ccagaagccg 


1080 


atcgtcatca 


cggagaacgg 


actgtcctgt cgggacgtcg tcgctgtgga cgggaaggtt 


1140 


cacgatccgg 


ctcgcatcga 


tttcaccacc cgctatctgc gcgagctcca ccgagccgtc 


1200 


gcggacggcg 


tcgcggtcga 


gggctacttc cactggtcca tcatggacaa cttcgaatgg 


1260 


gctgccggct 


accgcgagcg 


gttcgggctc attcacgtcg actacgagac cctggcgcgg 


1320 


acgcccaagg 


cgtccgctgc 


gtggtatcgc aaggtaatcg agagcaacgg agcgaccctt 


1380 


ttcggatga 






1389 



<210> 56 
<211> 462 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental- sample 
<220> 

<221> DOMAIN 
<222> (8),.. (458) 

<223> Glycosyl hydrolase family 1 
<220> 

<221> SITE 

<222> (16) . . . (30) 

<223> Glycosyl hydrolases family 1 N-terminal signature. Prosite id = 
PSO0653 

<220> 

<221> SITE 

<222> (366) . . , .{374) 

<223> Glycosyl hydrolases family 1 active site. Prosite id = PS00572 
<400> 56 

Met Ser Ala Ser Ser Pro Ser Arg Pro Leu Ser Phe Pro Glu Gin Phe 
3-5 10 15 



Val Trp Gly Ala Ala Ala Ala Ser Tyr Gin Val Glu Gly Ala Val His 
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20 25 30 



Glu Asp Gly Lys Gly Pro Ser Val Trp Asp Met Phe Cys Glu Lys Pro 
35 40 45 



Gly Ala Val Phe Gin Gly His Asp Gly Ala Val Ala Cys Asp His Tyr 
50 55 60 



His Arg Tyr Arg Glu Asp Val Ala Leu Met Arg Gin Val Gly Leu His 
65 70 75 80 



Ala Tyr Arg Leu Ser Val Cys Trp Pro Arg Val Leu Pro Glu Gly Val 
85 90 95 



Gly Gin Pro Asn Glu Lys Gly Leu Asp Phe Tyr Ser Arg Leu Val Asp 
100 105 110 



Ala Leu Leu Glu Ala Gly lie Thr Pro Trp Val Thr Leu Phe His Trp 
115 120 125 



Asp Tyr Pro Leu Ala Leu Tyr His Arg Gly Gly Trp Leu Asn Arg Asp 
130 135 140 



Ser Ala Asp Trp Phe Ala Glu Tyr Ala Gly Leu lie Ala Asp Arg Leu 
145 150 155 160 



Ser Asp Arg Val Gin His Phe Phe Thr Gin Asn Glu Pro Gin Val Tyr 
165 170 175 



lie Gly Phe Gly His Leu Glu Gly Lys His Ala Pro Gly Asp Thr Leu 
180 185 190 



Pro Met Ser Gin Val Leu Leu Ala Gly His His Ser Leu Leu Ala His 
195 200 205 



Gly Lys Ala Val Gin Ala Leu Arg Ala Gin Ala Lys Gin Gin Leu Arg 
210 215 220 



Val Gly Tyr Ala Pro Val Gly Met Pro Leu His Pro Phe Thr Asp Ser 
225 230 235 240 



Ala Glu Asp Val Ala Ala Ala Arg Lys Ala Thr Phe Trp Val Arg Glu 
245 - 250 255 



Lys Asn Ser Trp Asn Asn Ala Trp Trp Met Asp Pro Val Phe Leu Gly 
260 265 270 
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Glu Tyr Pro Ala Gin Gly Leu Ala Phe Phe Gly Arg Asp Val Pro Gin 
275 280 285 



Val Arg Glu Gly Asp Met Gin Leu He Ala Gin Pro Leu Asp Phe Phe 
290 295 300 



Gly Val Asn He Tyr Gin Ser Thr Pro Val Arg Ala Ser Ser Ala Glu 
305 310 315 320 



Ser Gly Phe Glu Val Val Pro His Pro Thr Gly Tyr Pro He Thr Ala 
325 330 335 



Phe Asn Trp Pro He Thr Pro Gin Ala Leu Tyr Trp Gly Pro Arg Phe 
340 345 350 



Phe Tyr Glu Arg Tyr Gin Lys Pro He Val He Thr Glu Asn Gly Leu 
355 360 365 



Ser Cys Arg Asp Val Val Ala Val Asp Gly Lys Val His Asp Pro Ala 
370 375 380 



Arg He Asp Phe Thr Thr Arg Tyr Leu Arg Glu Leu His Arg Ala Val 
385 390 395 400 



Ala Asp Gly Val Ala Val Glu Gly Tyr Phe His Trp Ser He Met Asp 
405 410 415 



Asn Phe Glu Trp Ala Ala Gly Tyr Arg Glu Arg Phe Gly Leu He His 
420 425 430 



Val Asp Tyr Glu Thr Leu Ala Arg Thr Pro Lys Ala Ser Ala Ala Trp 
435 440 445 



Tyr Arg Lys Val He Glu Ser Asn Gly Ala Thr Leu Phe Gly 
450 455 460 



<210> 57 
<211> 414 
<212> DNA 
<213> Unknovm 

<220> 

<223> Obtained from environmental sample 



<400> 57 

atgattgctt catctatgtt ctatggaacg gttcgtggaa tacaagagct aactcaaaac 



60 
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^ y y ^ ^ ^ y ^ 




cttaccagag ttgctgacgg aagtgatttt 


120 


gagtttgata 


gagttattga 


acgctcgatt 


gaaaacgtaa 


ccgaactatc 


aggtaagcta 


180 


actgattaca 


tggatttagt 


aacggagttt 


gctagaactg 


gtaaaacaat 


agatgaatct 


240 


tttaatttag 


ctaatacaac 


acaaatgtta 


atgaatattt 


ctgaattaac 


agcagatgaa 


300 


tcagtaaata 


gtttaactgc 


cgcaatgatt 


gcttttaata 


ttaacgcaga 


tgatagtatt 


360 


agaattgctg 


ataagttgaa 


tgaggttaac 


aatatcagcc 


tccttttgtg 


gtaa 


414 



<210> 58 
<211> 137 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> SITE 

<222> (52) . . . (55) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (86) . . . (89) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (93) . . . (96) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (133) . . . (136) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 58 

Met lie Ala Ser Ser Met Phe Tyr Gly Thr Val Arg Gly lie Gin Glu 
15 10 15 

Leu Thr Gin Asn Val He Ala Leu Asp Thr Ala Met Val Ser Leu Thr 
20 25 30 



Arg Val Ala Asp Gly Ser Asp Phe Glu Phe Asp Arg Val He Glu Arg 
35 , 40 45 



Ser He Glu Asn Val Thr Glu Leu Ser Gly Lys Leu Thr Asp Tyr Met 
50 55 60 



Asp Leu Val Thr Glu Phe Ala Arg Thr Gly Lys Thr 
65 70 75 



He Asp 



Glu Ser 
80 
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Phe Asn Leu Ala Asn Thr Thr Gin Met Leu Met Asn lie Ser Glu Leu 
85 90 95 

Thr Ala Asp Glu Ser Val Asn Ser Leu Thr Ala Ala Met lie Ala Phe 
100 105 110 

Asn lie Asn Ala Asp Asp Ser lie Arg lie Ala Asp Lys Leu Asn Glu 
115 120 125 

Val Asn Asn lie Ser Leu Leu Leu Trp 
130 135 

<210> 59 
<211> 1044 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 



<400> 59 



atgagaattt 


ttgaaggatt 


tcagcgaggt 


gtaaaccttg 


QcggctCTcrat 


ctcccagttc 


60 


gacaagtacg 


accatgagca 


tttccgcagc 


tttattacgg 


aaaatgacat 


cgccgccatt 


120 


gcagctcttg 


gttttgacca 


tgtccgcgtg 


ccggtggatt 


ataacgtgct 


ggaggatgag 


180 


gagggcaacc 


gcatcgacag 


cggatttgtc 


tacctgagaa 


gctgctacga 


gtggtgccgc 


240 


aaacacgacc 


tgaacatgct 


ggtggatctt 


cacgagtgct 


acggctactc 


cttcgatccg 


300 


ctgaaaaaag 


atatggaccg 


caaacgcttc 


ttctatgccg 


aagctctgca 


ggagcgtttt 


360 


ctgaagctct 


gggagcagat 


ctgtgaaacc 


tttaaagacg 


atcctgtgca 


cgtggcattc 


420 


gagccgctga 


atgagatcgt 


tttaggagag 


gtcgcagacg 


cctggaacgt 


gatgatccgc 


480 


aaatatatca 


agaccgtccg 


cgccatctgc 


Gcggagcact 


atctggtcct 


tggaagcgtg 


540 


cactacagcc 


acgttaccac 


catccctctt 


cttgaggcac 


cggcagatga 


caagatcgtc 


600 


ttcaacttcG 


actgctacga 


gccgctggtc 


ttcacccacc 


agggcgcata 


ctggctggag 


660 


gatatgattc 


cggatttccg 


catgacctat 


cctgccacca 


tggaagagtt 


ctacgaagca 


720 


acaaagaaga 


tcctgccaaa 


catgagtccg 


gatggattta 


aggatttcga 


tcaggagatg 


780 


ggtccgggct 


tctttgagaa 


gatcttcaca 


ccggccctga 


aacgtgccga 


gcaggacaat 


840 


gtagccctct 


actgcggcga 


gtacggcgtc 


attgatctgg 


cagataacca 


tgccaagatc 


900 


cgctggctca 


aagacatcca 


caccaccttc 


tccaaatacg 


gcatcggaag 


tgccctctgg 


960 


aactacaagg 


gcaaggattt 


cggctatgta 


gatgatcgct 


tcgccgagtg 


cagagaagca 


1020 


tttatcgagt 


gcctgaaggc 


ctga 








1044 
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<210> 60 
<211> 347 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMMN 
<222> (14) . . . (330) 

<223> Cellulase (glycosyl hydrolase family 5) 
<400> 60 

Met Arg lie Phe Glu Gly Phe Gin Arg Gly Val Asn Leu Gly Gly Trp 
15 10 15 



lie Ser Gin Phe Asp Lys Tyr Asp His Glu His Phe Arg Ser Phe He 
20 25 30 



Thr Glu Asn Asp He Ala Ala He Ala Ala Leu Gly Phe Asp His Val 
35 40 45 



Arg Val Pro Val Asp Tyr Asn Val Leu Glu Asp Glu Glu Gly Asn Arg 
50 55 60 



He Asp Ser Gly Phe Val Tyr Leu Arg Ser Cys Tyr Glu Trp Cys Arg 
65 70 75 80 



Lys His Asp Leu Asn Met Leu Val Asp Leu His Glu Cys Tyr Gly Tyr 
85 90 95 



Ser Phe Asp Pro Leu Lys Lys Asp Met Asp Arg Lys Arg Phe Phe Tyr 
100 105 110 



Ala Glu Ala Leu Gin Glu Arg Phe Leu Lys Leu Trp Glu Gin He Cys 
115 120 125 



Glu Thr Phe Lys Asp Asp Pro Val His Val Ala Phe Glu Pro Leu Asn 
130 135 140 



Glu He Val Leu Gly Glu Val Ala Asp Ala Trp Asn Val Met He Arg 
145 150 155 160 



Lys Tyr He Lys Thr Val Arg Ala He Cys Pro Glu His Tyr Leu Val 
165 170 175 



Leu Gly Ser Val His Tyr Ser His Val Thr Thr He Pro Leu Leu Glu 
180 185 190 
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Ala Pro Ala Asp Asp Lys He Val Phe Asn Phe His Cys Tyr Glu Pro 
195 200 205 



Leu Val Phe Thr His Gin Gly Ala Tyr Trp Leu Glu Asp Met He Pro 
210 215 220 



Asp Phe Arg Met Thr Tyr "Pro Ala Thr Met Glu Glu Phe Tyr Glu Ala 
225 230 235 240 



Thr Lys Lys He Leu Pro Asn Met Ser Pro Asp Gly Phe Lys Asp Phe 
245 250 255 



Asp Gin Glu Met Gly Pro Gly Phe Phe Glu Lys He Phe Thr Pro Ala 
260 265 270 



Leu Lys Arg Ala Glu Gin Asp Asn Val Ala Leu Tyr Cys Gly Glu Tyr 
275 280 285 



Gly Val He Asp Leu Ala Asp Asn His Ala Lys He Arg Trp Leu Lys 
290 295 300 



Asp He His Thr Thr Phe Ser Lys Tyr Gly He Gly Ser Ala Leu Trp 
305 310 315 



Asn Tyr Lys Gly Lys Asp Phe Gly Tyr Val Asp Asp Arg Phe Ala Glu 
325 330 335 



Cys Arg Glu Ala Phe He Glu Cys Leu Lys Ala 
340 345 



<210> 61 
<211> 1230 
<212> DNA 
<213> Unlcnown 

<220> 

<223> Obtained from environmental sample 
<400> 61 



ttggtatgga 


caccagctcg atcaacgctt 


gctggatctt 


ctgaaatccc 


actaatgaca 


60 


atgaatatat 


tccccaatag aaaagactca 


cgaatgtccc 


tctggatcaa 


gcttggcata 


120 


ctttgtatga 


tggctggaac ggtgatggtt 


cacggagcgc 


agactggtca 


aggagaagca 


180 


acaatgaatc 


aagcaaatgg cttcaaggta 


agcaacggga 


ccaatatcag 


ccattggttg 


240 


tcccagtgtt 


ttgaaacaat gccaccccgg 


cgcggatttt 


tctccgaact 


ggatgttatc 


300 


ttcatccgct 


cgctggggat ggatcatttc 


cgtcttccgg tggacgagaa ggaactttgg 


360 
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acggaggatc 


ttgagaagat 


tcccgaagcg 


tgggattacc 


tcaggaatgc 


tctaagctgg 


420 


gctagaaagc 


atgagcttcg 


tgtgattgtg 


gatcttcacg 


tcgtgcggtc 


ccatcacttt 


480 


aatgcggcaa 


atgaaggggg 


aaccaacact 


ctgtgggatg 


atccggaggc 


gcaggaaagt 


540 


ttcctcaacc 


tttggaggca 


gctttcggca 


gagctcgcct 


acaccgatgt 


ggactgggtg 


600 


gcctatgaga 


tcatgaatga 


ggccgtcgcg 


gatgatccgg 


aggactggaa 


tcgtctcatc 


660 


gccaaagccc 


actccttgat 


ccgcgagcgt 


gagccaaggc 


gcacactcgt 


catcggatcc 


720 


aaccggtggc 


aaattccgtc 


aacgttcccg 


gatctgaaga 


ttccggacgg agabccgaac 


780 


atcctcctga 


gtttccattt 


ctacgcgcct 


ctgcttttca 


cccactatcg 


ggcaacctgg 


840 


gttgcctttt 


acgattatga 


tgggccggtt 


tcctatcctg 


gcaggatcgt 


tgatgahgca 


900 


gctcttgaga 


aaaatgatta 


tactcctgca 


t tcaaagaca 


agattcgtgc 


gttgaatggt 


960 


gtgtatgaca 


tcgacgctct 


cgaaaaagaa 


atgcagccgg 


ctatcgaata 


cgcaaaacag 


1020 


aaagggttac 


cactgtattg 


cggagagtgg 


ggatgttttc 


atgctgtgga 


aagaaaacaa 


1080 


cgcttgcaat 


ggtacaaaga 


tatatccact 


attttgaaac 


gcaatgggat 


cgcccatgcc 


1140 


acatgggatt 


acaagggcga 


gttcggcatt 


gtggacactt 


ggacactagg 


tgttgattgg 


1200 


aatttggtag 


gagcaatcct 


gtcagagtag 








1230 



<210> 62 
<211> 409 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (62) . . . (390) 

<223> Cellulase (glycosyl hydrolase family 5) 
<220> 

<221> SITE 

<222> (73) . . . (76) 

<223> N-glycosylation site. Prosite id - PSOOOOl 

<400> 62 ^ 
Met Val Trp Thr Pro Ala Arg Ser Thr Leu Ala Gly Ser Ser Glu He 
1 5 .10 15 

Pro Leu Met Thr Met Asn He Phe Pro Asn Arg Lys Asp Ser Arg Met 
20 25 30 



Ser Leu Trp He Lys Leu Gly He Leu Cys Met Met Ala Gly Thr Val 
35 40 45 
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Met Val His Gly Ala Gin Thr Gly Gin Gly Glu Ala Thr Met Asn Gin 
50 55 60 



Ala Asn Gly Phe Lys Val Ser Asn Gly Thr Asn lie Ser His Trp Leu 
65 70 75 80 



Ser Gin Cys Phe Glu Thr Met Pro Pro Arg Arg Gly Phe Phe Ser Glu 
85 90 95 



Leu Asp Val lie Phe lie Arg Ser Leu Gly Met Asp His Phe Arg Leu 
100 105 110 



Pro Val Asp Glu Lys Glu Leu Trp Thr Glu Asp Leu Glu Lys lie Pro 
115 120 125 



Glu Ala Trp Asp Tyr Leu Arg Asn Ala Leu Ser Trp Ala Arg Lys His 
130 135 140 



Glu Leu Arg Val He Val Asp Leu His Val Val Arg Ser His His Phe 
145 150 155 160 



Asn Ala Ala Asn Glu Gly Gly Thr Asn Thr Leu Trp Asp Asp Pro Glu 
165 170 175 



Ala Gin Glu Ser Phe Leu Asn Leu Trp Arg Gin Leu Ser Ala Glu Leu 
180 185 190 



Ala Tyr Thr Asp Val Asp Trp Val Ala Tyr Glu He Met Asn Glu Ala 
195 200 205 



Val Ala Asp Asp Pro Glu Asp Trp Asn Arg Leu He Ala Lys Ala His 
210 215 220 



Ser ' Leu He Arg Glu Arg Glu Pro Arg Arg Thr Leu Val He Gly Ser 
225 230 235 240 



Asn Arg Trp Gin He Pro Ser Thr Phe Pro Asp Leu Lys He Pro Asp 
245 250 255 



Gly Asp Pro Asn He Leu Leu Ser Phe His Phe Tyr Ala Pro Leu Leu 
260 265 270 



Phe Thr His Tyr Arg Ala Thr Trp Val Ala Phe Tyr Asp Tyr Asp Gly 
275 280 285 
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Pro Val Ser Tyr Pro Gly Arg He Val Asp Asp Ala Ala Leu Glu Lys 
290 295 300 



Asn Asp Tyr Thr Pro Ala Phe Lys Asp Lys lie Arg Ala Leu Asn Gly 
305 310 315 320 



Val Tyr Asp He Asp Ala Leu Glu Lys Glu Met Gin Pro Ala He Glu 
325 330 335 



Tyr Ala Lys Gin Lys Gly Leu Pro Leu Tyr Cys Gly Glu Trp Gly Cys 
340 345 350 



Phe His Ala Val Glu Arg Lys Gin Arg Leu Gin Trp Tyr Lys Asp He 
355 360 365 



Ser Thr He Leu Lys Arg Asn Gly He Ala His Ala Thr Trp Asp Tyr 
370 375 380 



Lys Gly Glu Phe Gly He Val Asp Thr Trp Thr Leu Gly Val Asp Trp 
385 390 395 400 



Asn Leu Val Gly Ala He Leu Ser Glu 
405 



<210> 63 
<211> 1152 
<212> DNA 
<213> Unknovm 

<220> 

<223> Obtained from environmental sample 
<400> 63 



atgaaacgga gggaattcat gttggggggt gcgggtgttg 


ctgcgttggc 


atcgactctt 


60 


ggagtctccg ccggttccac ttccgggcag ggagtgaacg 


agaatgtgag ggtataccgg 


120 


aatgcgattc cccgttggag ggggttcaac ctcatgccct 


ttttctcggc aatgagcacc 


180 


aacccggaat acaatggtct gacggtgccg gaggatgacc 


taaactggat 


ccgcgactgg 


240 


ggttttgact atgtccggct tccgattgat tactggattc 


tggttgattc 


cgattggcga 


300 


gatgcaaagc gcatgcgggt agaggatgtt cgcaaggccg 


accagaaggg 


atattcacgg 


360 


ctggacgctg tgattgaagc ctgtatcgcg aagggtttgc 


acctcaacct 


gaatatgcat 


420 


cggtgtcccg ggtattgcat caatggctgg gaactggagc 


cctataacct 


cttcaaggat 


480 


gagcaggcgg aggatgattt tgtctaccat tgggagttgc 


tcgcgagacg 


ctataaggga 


540 


atcgatcctt cgctgctgag tttcaatctg ctgaatgagg 


ctcccaatcc 


tggagacaag 


600 
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atgtcgtcgg aggattatcg tcgggtgatg cttcgatccg ctgctgttat tcgggggata 660 

agcccggaec gcatgattat tgtggacggg ctggaaatcg gtaaatcagt tgttccaggg 720 

ctgatgcatg agccatttgc ccaagctgtt catgcctacg agccccacga gttgagccat 780 

tataatgcgc cttggacgtc ggtgtttatg ggtattcctg agccatcctg gccgacagtt 840 

cgtttggatg gttctctgtt cgaccgcaag cgactggagt tgtatttcgc gccgtggggg 900 

gagttggtcG gccagggggt aggggtccac tgtggggaga ccggttgcta cattcatacg 960 

ccccatcggg tgtttctgtc ctggttcgaa gatgttttgg atatcctgac cggatacgac 1020 

atagggtggg ctctatggaa tttccgggga gatttcggaa tacttgattc caaacgcaag 1080 

gatgtgcaat atgtcgattg gtatggacac cagctcgatc aacgcttgct ggatcttctg 1140 

aaatcccact aa 1152 

<210> 64 
<211> 383 
<2Z2> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAL 
<222> (1) . , . (24) 

<220> 

<221> DOMAIN 
<222> (48) . . . (357) 

<223> Cellulase (glycosyl hydrolase family 5) 
<400> 64 

Met hys Arg Arg Glu Phe Met Leu Gly Gly Ala Gly Val Ala Ala Leu 
1 5 10 15 

Ala Ser Thr Leu Gly Val Ser Ala Gly Ser Thr Ser Gly Gin Gly Val 
20 25 ' 30 

Asn Glu Asn Val Arg Val Tyr Arg Asn Ala lie Pro Arg Trp Arg Gly 
35 40 45 

Phe Asn Leu Met Pro Phe Phe Ser Ala Met Ser Thr Asn Pro Glu Tyr 
50 55 60 

Asn Gly Leu Thr Val Pro Glu Asp Asp Leu Asn Trp lie Arg Asp Trp 
65 70 75 80 



Gly Phe Asp Tyr Val Arg Leu Pro lie Asp Tyr Trp lie Leu Val Asp 
85 90 95 
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Ser Asp Trp Arg Asp Ala Lys Arg Met Arg Val Glu Asp Val Arg Lys 
100 105 110 



Ala Asp Gin Lys Gly Tyr Ser Arg Leu Asp Ala Val lie Glu Ala Cys 
115 120 125 



lie Ala Lys Gly Leu His Leu Asn Leu Asn Met His Arg Cys Pro Gly 
130 135 140 



Tyr Cys lie Asn Gly Trp Glu Leu Glu Pro Tyr Asn Leu Phe Lys Asp 
145 150 155 160 



Glu Gin Ala Glu Asp Asp Phe Val Tyr His Trp Glu Leu Leu Ala Arg 
165 170 175 



Arg Tyr Lys Gly He Asp Pro Ser Leu Leu Ser Phe Asn Leu Leu Asn 
180 185 190 



Glu Ala Pro Asn Pro Gly Asp Lys Met Ser Ser Glu Asp Tyr Arg Arg 
195 200 205 



Val Met Leu Arg Ser Ala Ala Val He Arg Gly He Ser Pro Asp Arg 
210 215 220 



Met He He Val Asp Gly Leu Glu He Gly Lys Ser Val Val Pro Gly 
225 230 235 240 



Leu Met His Glu Pro Phe Ala Gin Ala Val His Ala- Tyr Glu Pro His 
245 250 255 



Glu Leu Ser His Tyr Asn Ala Pro Trp Thr Ser Val Phe Met Gly He 
260 265 270 



Pro Glu Pro Ser Trp Pro Thr Val Arg Leu Asp Gly Ser Leu Phe Asp 
275 280 285 



Arg Lys Arg Leu Glu Leu Tyr Phe Ala Pro Trp Gly Glu Leu Val Arg 
290 295 300 



Gin Gly Val Gly Val His Cys Gly Glu Thr Gly Cys Tyr He His Thr 
305 310 315 320 



Pro His Arg Val Phe Leu Ser Trp Phe Glu Asp Val Leu Asp He Leu 
325 330 335 
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Thr Gly Tyr Asp lie Gly Trp Ala Leu Trp Asn Phe Arg Gly Asp Phe 
340 345 ' 350 

Gly lie Leu Asp Ser Lys Arg Lys Asp Val Gin Tyr Val Asp Trp Tyr 
355 360 365 

Gly His Gin Leu Asp Gin Arg Leu Leu Asp Leu Leu Lys Ser His 
370 375 380 

<210> 65 
<211> 1131 
<212> DNA 
<213> Un3cnown 

<220> 

<223> Obtained from environmental sample 



<400> 65 



atgaacacac 


tcctaccacg 


gcggcgactg 


tggtcctcca 


cggcgafccct 


gcgcacgctg 


60 


gcggccgggg 


cgctggcggc 


cggtatggtc 


ctggcacccg 


tcagtgccgc 


caacgcggcc 


120 


accaccctcg 


gtgcctcggc 


ggcggagaag 


ggccggtact 


tcggtgcggc 


cgtcgggacg 


180 


tacaagttca 


acgacagcac 


ctacatgtcg 


gtgctgaacc 


gcgagttcaa 


cagcctggtc 


240 


gccgagaacg 


agatgaagtg 


ggacgcgacc 


gagccccagc 


gcggcgtgtt 


caactacagc 


300 


gccggggacc 


gcatcgtcaa 


ccacgcccga 


tcccagggca 


tgaaggtacg 


cggacacgcc 


360 


ctgttgtggc 


acgcccagca 


gccacgctgg 


acggagggcc 


tgtccggcgg 


cgacctgcgc 


420 


aacgccgcga 


tcaaccatgt 


cacccaggtg 


gccagccact 


tccgggggca 


gatctactcc 


480 


tgggacgtgg 


tgaacgaggc 


tttcgccgac 


ggtggcagcg 


gtgcccggcg 


ggactcgaac 


540 


ctccagcgca 


Gcggcaacga 


ctggatcgag 


gcggcgttcc 


gtgccgcccg 


ggcagccgat 


600 


cccaacgcca 


agctctgcta 


caacgactac 


aacaccgacg 


ggatcaacgc 


gaagtccacc 


660 


ggcgtctaca 


acatggtgcg 


tgacttcaag 


tcccgtgggg 


tgccgatcga 


ctgcgtgggc 


720 


ttccagtcac 


acctgggcac 


caccctcccc 


ggtgactacc 


aggccaacct 


tcagcgcttc 


780 


gccgacctgg 


gcgtcgacgt 


ggagatcacc 


gagctggaca 


tcacccaggg 


cggaaaccag 


840 


gccaacatgt 


acggcgccgt 


cacccgcgcc 


tgcctggcga 


tctcgcgctg 


caccggcatc 


900 


accgtgtggg 


gggtacggga 


ctgcgactcc 


tggcgtggtg 


gggacaacgc 


cctgctgttc 


960 


gactgcgccg 


gcaacaagaa 


gcccgcgtac 


acggccgtcc 


tcgacgccct 


caacagcggc 


1020 


tcgaacccga 


accccaaccc 


caccggcaac 


cggctgcgca 


acgaggcctc 


cggtcgatgc 


1080 


ctggacgtca 


acggcgcaag 


ctccgccaac 


gggtcacaaa 


tgatccaaag 


a 


1131 



<210> 66 
<211> 377 
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<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAL 
<222> (1) . , . (39) 

<220> 

<221> DOMAIN 
<222> (42) . . . (337) 

<223> Glycosyl hydrolase family 10 
<220> 

<221> SITE 

<222> (99) . . . (102) 

<223> N-glycosylation site. Prosite id PSOOOOl 
<220> 

<221> SITE 

<222> (268) . . . (278) 

<223> Glycosyl hydrolases family 10 active site. Prosite id = PS00591 

<220> 

<221> SITE 

<222> (375) . . . (378) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 66 

Met Asn Thr Leu Leu Pro Arg Arg Arg Leu Trp Ser Ser Thr Ala lie 
is 10 15 

Leu Arg Thr Leu Ala Ala Gly Ala Leu Ala Ala Gly Met Val Leu Ala 
20-25 30 

Pro Val Ser Ala Ala Asn Ala Ala Thr Thr Leu Gly Ala Ser Ala Ala 
35 40 45 

Glu Lys Gly Arg Tyr Phe Gly Ala Ala Val Gly Thr Tyr Lys Phe Asn 
50 55 60 

Asp Ser Thr Tyr Met Ser Val Leu Asn Arg Glu Phe Asn Ser Leu Val 

70 75 80 

Ala Glu Asn Glu Met Lys Trp Asp Ala Thr Glu Pro Gin Arg Gly Val 
85 90 95 

Phe Asn Tyr Ser Ala Gly Asp Arg lie Val Asn His Ala Arg Ser Gin 
100 105 110 



Gly Met Lys Val Arg Gly His Ala Leu Leu Trp His Ala Gin Gin Pro 
115 120 125 
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Arg Tirp Thr Glu Gly Leu Ser Gly Gly Asp Leu Arg Asn Ala Ala lie 
130 135 140 



Asn His Val Thr Gin Val Ala Ser His Phe Arg Gly Gin lie Tyr Ser 
145 150 155 160 



Trp Asp Val Val Asn Glu Ala Phe Ala Asp Gly Gly Ser Gly Ala Arg 
165 170 175 



Arg Asp Ser Asn Leu Gin Arg Thr Gly Asn Asp Trp He Glu Ala Ala 
180 185 190 



Phe Arg Ala Ala Arg Ala Ala Asp Pro Asn Ala Lys Leu Cys Tyr Asn 
1S>5 200 205 



Asp Tyr Asn Thr Asp Gly He Asn Ala Lys Ser Thr Gly Val Tyr Asn 
210 215 220 



Met Val Arg Asp Phe Lys Ser Arg Gly Val Pro He Asp Cys Val Gly 
225 230 235 240 

Phe Gin Ser His Leu Gly Thr Thr Leu Pro Gly Asp Tyr Gin Ala Asn 
245 250 255 



Leu Gin Arg Phe Ala Asp Leu Gly Val Asp Val Glu He Thr Glu Leu 
' 260 265 270 



Asp He Thr Gin Gly Gly Asn Gin Ala Asn Met Tyr Gly Ala Val Thr 
275 280 285 



Arg Ala Cys Leu Ala He Ser Arg Cys Thr Gly He Thr Val Trp Gly 
250 295 300 



Val Arg Asp Cys Asp Ser Trp Arg Gly Gly Asp Asn Ala Leu Leu Phe 
305 310 315 320 



Asp Cys Ala Gly Asn Lys Lys Pro Ala Tyr Thr Ala Val Leu Asp Ala 
325 330 335 



Leu Asn Ser Gly Ser Asn Pro Asn Pro Asn Pro Thr Gly Asn Arg Leu 
340 345 350 



Arg Asn Glu Ala Ser Gly Arg Cys Leu Asp Val Asn Gly Ala Ser Ser 
355 360 365 
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Ala Asn Gly Ser Gin Met lie Gin Arg 
370 375 

<210> 67 
<211> 1023 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environitiental sample 
<400> 67 



atgaaatata 


tattttccrha 


udUddUcicLug 


atgattttaa tcggttttat accggtctat 


60 




d L. L.v«ci(^c Oy ct 


ccaaacatac 


tctctcccct tcctcagcgt agaaggaaat 


120 




ai^y cia.cLcxu.y y 


tgaggaggtt 


afctttgcggg gtgtatcgtt tcccgatccc 


180 


^A^fjL L# o u d u« ^y sj 


a i" rra t" rxr* a n 


tcaatggaac 


aaacggtatt tccaggcagc aaaagattgg 


240 


aactgtaatg 


tCQtcaaaat" 

w Vp^ <^ V^CL^ d-CL 


3 f^e^t^ni' f~ -H 
d^i^yy Ul^^dU 


ccgcaaagat ggcgggaaag gggaaaagaa 


300 


aattatctga 


CLCaV.* U U L« UCl^CL 


^ 2a 21 /*T/~T/^ ^ S +— j-i 

tddyyy catc 


gagtgggccg gtgaactcgg tatgtacgtg 


360 


atcattgact 


ggcacactat 


cggcaatccg 


attaccgaag tgttcttcgg cgagctctat 


420 


aatacgaccc 


agaccgaaac 


gttccggttc 


tggagaacaa tagcggagcg atatgcaggt 


480 


aatcccgttg 


ttgcatttta 


tgaattgttt 


aatgaaccga ccgattataa cggtcggctc 


540 


gggaggatga 


cctgggatca 


atataaagaa 


ttcatcgaag agatcattta tataatttat 


600 


gcacacgacg 


aaaccgtgat 


accgcttgta 


ggcggtttcg attggggata tgatctcagg 


660 


aatgttagag 


ataatccgat 


aaatgccccg 


ggtatcgcgt atgttactca cccgtatccg 


720 


caaaagcggg 


accaaccgtg 


ggaagaaaaa 


tgggaaaggg atttcggttt cgtagccgac 


780 


acctaccctg 


tgtttgctac 


cgagttcgga 


tttatgagtg aggatggttt gggtgcacat 


840 


attcccgtta 


tcggtgatga 


aacatacggt 


gaagcgatca tcagttactt caatgagaaa 


900 


ggtatatcgt 


ggacggcctg 


ggtgttcgat. 


ccgctctggt cgccgcagct tattaaagac 


960 


tggtatttta 


ccccgacccg 


gcagggacag 


ttttttaaag agaagctaat ggagttgaat 


1020 


taa 








1023 



<210> 68 
<211> 340 
<212> PRT 
<213> UnJcnown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAIi 
<222> (1) . . . (23) 
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<220> 

<221> DOMAIN 
<222> (40).., (317) 

<223> Cellulase (glycosyl hydrolase family 5) 
<220> 

<221> SITE 

<222> (143) . . . (146) 

<223> N-glycosylaticn site. Prosite id = PSOOOOl 
<400> 68 

Met Lys Tyr lie Phe Ser Tyr lie He Met Met He Leu He Gly Phe 
15 10 15 



He Pro Val Tyr Gly Phe Gly . Asp Ser Pro Asp Gin Thr Tyr Ser Leu 
20 25 30 



Pro Phe Leu Ser Val Glu Gly Asn Ser Phe Val Asp Glu Asn Gly Glu 
35 40 45 



Glu Val He Leu Arg Gly Val Ser Phe Pro Asp Pro Asn Arg Leu Asp 
50 55 60 



Asp Ala Thr Gin Trp Asn Lys Arg Tyr Phe Gin Ala Ala Lys Asp Trp 
65 70 75 80 



Asn Cys Asn Val Val Arg He Pro Val His Pro Gin Arg Trp Arg Glu 
85 90 95 



Arg Gly Lys Glu Asn Tyr Leu Lys Leu Leu Asp Lys Gly He Glu Trp 
100 105 110 



Ala Gly Glu Leu Gly Met Tyr Val He He Asp Trp His Thr He Gly 
H5 120 125 



Asn Pro He Thr Glu Val Phe Phe Gly Glu Leu Tyr Asn Thr Thr Gin 
130 135 140 



Thr Glu Thr Phe Arg Phe Trp Arg Thr He Ala Glu Arg Tyr Ala Gly 
145 150 155 160 



Asn Pro Val Val Ala Phe Tyr Glu Leu Phe Asn Glu Pro Thr Asp Tyr 
165 170 175 

Asn Gly Arg Leu Gly Arg Met Thr Trp Asp Gin Tyr Lys Glu Phe He 
180 185 190 



Glu Glu He He Tyr He He Tyr Ala His Asp Glu Thr Val He Pro 
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195 200 205 



Leu Val Gly Gly Phe Asp Trp Gly Tyr Asp Leu Arg Asn Val Arg Asp 
210 215 220 



Asn Pro lie Asn Ala Pro Gly lie Ala Tyr Val Thr His Pro Tyr Pro 
225 230 235 240 



Gin Lys Arg Asp Gin Pro Trp Glu Glu Lys Trp Glu Arg Asp Phe Gly 
245 250 255 



Phe Val Ala Asp Thr Tyr Pro Val Phe Ala Thr Glu Phe Gly Phe Met 
260 265 270 



Ser Glu Asp Gly Leu Gly Ala His lie Pro Val lie Gly Asp Glu Thr 
275 280 285 



Tyr Gly Glu Ala lie lie Ser Tyr Phe Asn Glu Lys Gly lie Ser Trp 
290 295 300 



Thr Ala Trp Val Phe Asp Pro Leu Trp Ser Pro Gin Leu lie Lys Asp 
305 310 315 320 



Trp Tyr Phe Thr Pro Thr Arg Gin Gly Gin Phe Phe Lys Glu Lys Leu 
325 330 335 



Met Glu Leu Asn 
340 



<210> 69 
<211> 1182 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 69 



atgagtttta 


aaaaccacat 


acttttgtcg 


ctcctcatag tattgctttt cttttcagcg 


60 


tgcgatatcg 


aagaaccgat 


cgccggagat 


tatcatacac ttgtggatca aaacgctata 


120 


tcgcacaccc 


gcgcattatt 


caccaacctc 


gaacgtatcc gtcacgatca tatcctcttc 


180 


ggtcatcagg 


atgcgcttgc 


atacggtgtt 


cactggcgca acgatgagcc gggtcgatcg 


240 


gatgtattcg 


aagtaaccgg 


ttcgtatcct 


gcggtgtatg gctgggagat tggcgatatt 


300 


gaacttggtg 


caccggaaaa 


tctggataac 


gtaaacttcg atcaaatgca gggctggatt 


360 


cgcgaagggt 


acgaacgcgg 


cggtataatt 


acgattagct ggcatatgaa caatccggca 


420 
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tcgggtggtg 


attcgtggga 


tgtgaatgga 


ggtcataaag 


cggtaactaa 


gatacttccc 


480 


ggcggagaac 


ttcacgatac 


gtttaaagaa 


tggctggata 


cgtttgcaaa 


attcgcgaag 


540 


agccagattg 


cttttcccga 


aacaaataat 


gaacacctta 


tcccggtcat 


attccggccg 


600 


tatcatgaaa 


acaccggaag 


ctggttctgg 


tggggcgccg 


accactgtac 


acctgaagaa 


660 


tataaaaagt 


tatggcgatt 


taccgtcgaa 


tacctgcgcg 


atgtaaaagg 


tgttcacaat 


720 


ctcctctggg 


cgtattcacc 


tgccggcaat 


gctgcggatt 


cagaggaagc 


atattttgct 


780 


cggtatcccg 


gcgacgacta 


tgttgatatt 


attggattcg 


acgattacgg 


cagtgtgcgg 


840 


aaaccgtatc 


aaatcgaacg 


ttttactaac 


cggattcgaa 


cgattgtaaa 


cttcgccgaa 


900 


gcacgaaata 


aaatcccggc 


aataacggaa 


accggctatg 


aaactatccc 


cgatccgcaa 


960 


tggtggacgg 


gtacattgct 


tagtgcactt 


gatcacgatt 


tgacaacccg gagaatagca 


1020 


tacgtacttg 


tgtggcgaaa 


ttcaaacaat 


gctaccgacc 


ggcagaatca 


ttattacgct 


1080 


ccgtatcccg 


gacatccaag 


tgctgacgat 


tttatcgcgt 


tcaggaatca 


cccgttgata 


1140 


gttttcgaag 


atgatctgcc 


gggtatgtat 


acactaccgt 


aa 




1182 



<210> 70 
<211> 393 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAL 
<222> (1) . , . <20) 

<220> 

<221> DOMAIN 
<222> (35) . . . (351) 

<223> Glycosyl hydrolase family 26 
<220> 

<221> SITE 

<222> (355).., (358) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 70 

Met Ser Phe Lys Asn His lie Leu Leu Ser Leu Leu lie Val Leu Leu 
^ S 10 15 

Phe Phe Ser Ala Cys Asp lie Glu Glu Pro lie Ala Gly Asp Tyr His 
20 25 30 



Thr Leu Val Asp Gin Asn Ala He Ser His Thr Arg Ala Leu Phe Thr 
35 40 45 
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Asn Leu Glu Arg lie Arg His Asp His lie Leu Phe Gly His Gin Asp 
50 55 60 



Ala Leu Ala Tyr Gly Val His Trp Arg Asn Asp Glu Pro Gly Arg Ser 
65 70 75 80 



Asp Val Phe Glu Val Thr Gly Ser Tyr Pro Ala Val Tyr Gly Trp Glu 
85 90 95 



lie Gly Asp He Glu Leu Gly Ala Pro Glu Asn Leu Asp Asn Val Asn 
100 105 110 



Phe Asp Gin Met Gin Gly Trp He Arg Glu Gly Tyr Glu Arg Gly Gly 
115 120 125 



He He Thr He Ser Trp His Met Asn Asn Pro Ala Ser Gly Gly Asp 
130 135 140 



Ser Trp Asp Val Asn Gly Gly His Lys Ala Val Thr Lys He Leu Pro 
145 150 155 160 



Gly Gly Glu Leu His Asp Thr Phe Lys Glu Trp Leu Asp Thr Phe Ala 
165 170 175 



Lys Phe Ala Lys Ser Gin He Ala Phe Pro Glu Thr Asn Asn Glu His 
180 185 190 



Leu He Pro Val He Phe Arg Pro Tyr His Glu Asn Thr Gly Ser Trp 
195 200 205 



Phe Trp Trp Gly Ala Asp His Cys Thr Pro Glu Glu Tyr Lys Lys Leu 
210 215 220 



Trp Arg Phe Thr Val Glu Tyr Leu Arg Asp Val Lys Gly Val His Asn 
225 230 235 240 



Leu Leu Trp Ala Tyr Ser Pro Ala Gly Asn Ala Ala Asp Ser Glu Glu 
245 250 255 



Ala Tyr Phe Ala Arg Tyr Pro Gly Asp Asp Tyr Val Asp He He Gly 
260 265 270 



Phe Asp Asp Tyr Gly Ser Val Arg Lys Pro Tyr Gin He Glu Arg Phe 
275 280 285 
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Thr Asn Arg lie Arg Thr lie Val Asn Phe Ala Glu Ala Arg Asn Lys 
290 295 300 

He Pro Ala He Thr Glu Thr Gly Tyr Glu Thr He Pro Asp Pro Gin 
305 310 315 320 

Trp Trp Thr Gly Thr Leu Leu Ser Ala Leu Asp His Asp Leu Thr Thr 
325 330 335 

Arg Arg He Ala Tyr Val Leu Val Trp Arg Asn Ser Asn Asn Ala Thr 
340 345 ' 350 

Asp Arg Gin Asn His Tyr Tyr Ala Pro Tyr Pro Gly His Pro Ser Ala 
355 360 365 

Asp Asp Phe He Ala Phe Arg Asn His Pro Leu He Val Phe Glu Asp 
370 375 380 

Asp Leu Pro Gly Met Tyr Thr Leu Pro 



<210> 71 
<211> 1089 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 71 

atgaaacttt taaaactttt aatctttctc cttattacgg taattttttc tgatgtttcg 60 

gctcaaactt ttcaaataca aaaaggcaag aacattagcc attggctgtc ccaaagtaaa 120 

agaaggggag aagagcgaaa agagttcttt actaagaatg acgtagaatt tattgcaggc 180 

atcggcttcg atcatattcg tattcctatt gacgaggagc aaatgtggga tgaaaaaggc 240 

aacaaagagc ctgaagcgtt tcagttgctg cacaacgcga tagaatggag caggcaatcg 3 00 

aacttaaaag ttattgtgga cctgcatatt ttgaggtcgc attatttcaa cgcggaagaa 360 

aaaccgcttt ttacggaccc taaagctcag gaacgttttt accaatgttg ggcggatctg 420 

tctggtgaat tgaaaaaata tccgaataca ctggtggctt atgaattaat gaacgaacct 480 

gtagccgatg atccggaaga ctggaataga attgtaagag aatcagtaaa aaggctaagg 540 

gtgcttgagc ccaatagggt tattgtaatc gggtctaacc gatggcagca ttatgacact 600 

ctgaaggatt tatacgtgcc ggaaaacgac aaaaacatca ttttaagctt tcatttttat 660 

aaccctatgt tgcttacgca ttacagggcc agctgggtaa atttcggcga ttaccagggt 720 
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cccgttaact 


acccgggaca 


gttggtagac 


tcaaagcatt 


tgtcgggact 


gagcgaagat 


780 


ttaagaaaga 


aagtcgagca 


aaacaatggc 


gtttataata 


aggctcggat 


tgagaaaatg 


840 


atagccgaag 


ccgttgctgt 


agcaaaaaag 


cacaacctcc 


ctttgtattg 


tggtgaatgg 


900 


ggtgcctacg 


aaaaagcgcc 


aagggagccc 


aggctacaat 


ggtacagaga catggtggat 


960 


gtgttgaaca 


aaaacaatat 


tgcctggact 


acctgggact 


ataaaggagg 


cttcggcata 


1020 


gttgacgcca 


aaggaaacaa 


agacgaacag 


ttgatcaatg 


tattaacagg 


aaaagagaaa 


1080 



aaaatgtaa 1089 



<210> 72 
<211> 362 
<212> PRT 
<213> Un3cnown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAL 
<222> (1) . . . (21) 

<220> 

<221> DOMAIN 
<222> (22) . . . (340) 

<223> Cellulase (glycosyl hydrolase family 5) 
<220> 

<221> SITE 

<222> (31) . . . (34) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (154) . . , (163) 

<223> Glycosyl hydrolases family 5 signature. Prosite id = PS00659 
<400> 72 

Met Lys Leu Leu Lys Leu Leu lie Phe Leu Leu lie Thr Val lie Phe 
15 10 15 



Ser Asp Val Ser Ala Gin Thr Phe Gin lie Gin Lys Gly Lys Asn lie 
20 25 30 



Ser His Trp Leu Ser Gin Ser Lys Arg Arg Gly Glu Glu Arg Lys Glu 
35 40 45 



Phe Phe Thr Lys Asn Asp Val Glu Phe lie Ala Gly lie Gly Phe Asp 
50 55 60 



His He Arg He Pro He Asp Glu Glu Gin Met Trp Asp Glu Lys Gly 
65 70 75 80 
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Asn Lys Glu Pro Glu Ala Phe Gin Leu Leu His Asn Ala lie Glu Trp 
85 90 95 



Ser Arg Gin Ser Asn Leu Lys Val He Val Asp Leu His lie Leu Arg 
100 105 110 



Ser His Tyr Phe Asn Ala Glu Glu Lys Pro Leu Phe Thr Asp Pro Lys 
115 120 125 



Ala Gin Glu Arg Phe Tyr Gin Cys Trp Ala Asp Leu Ser Gly Glu Leu 
130 135 140 



Lys Lys Tyr Pro Asn Thr Leu Val Ala Tyr Glu Leu Met Asn Glu Pro 
145 150 155 160 



Val Ala Asp Asp Pro Glu Asp Trp Asn Arg He Val Arg Glu Ser Val 
165 170 175 



Lys Arg Leu Arg Val Leu Glu Pro Asn Arg Val He Val He Gly Ser 
180 185 190 



Asn Arg Trp Gin His Tyr Asp Thr Leu Lys Asp Leu Tyr Val Pro Glu 
195 200 205 



Asn Asp Lys Asn He He Leu Ser Phe His Phe Tyr Asn Pro Met Leu 
210 215 220 



Leu Thr His Tyr Arg Ala Ser Trp Val Asn Phe Gly Asp Tyr Gin Gly 
225 230 235 240 



Pro Val Asn Tyr Pro Gly Gin Leu Val Asp Ser Lys His Leu Ser Gly 
245 250 255 



Leu Ser Glu Asp Leu Arg Lys Lys Val Glu Gin Asn Asn Gly Val Tyr 
260 265 270 



Asn Lys Ala Arg He Glu Lys Met He Ala Glu Ala Val Ala Val Ala 
275 280 285 



Lys Lys His Asn Leu Pro Leu Tyr Cys Gly Glu Trp Gly Ala Tyr Glu 
290 295 300 



Lys Ala Pro Arg Glu Pro Arg Leu Gin Trp Tyr Arg Asp Met Val Asp 
305 310 315 320 
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Val Leu Asn Lys Asn Asn lie Ala Trp Thr Thr Trp Asp Tyr Lys Gly 
325 330 335 

Gly Phe Gly lie Val Asp Ala Lys Gly Asn Lys Asp Glu Gin Leu lie 
340 345 350 

Asn Val Leu Thr Gly Lys Glu Lys Lys Met 



<210> 73 
<211> 1146 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 73 

gtggatatta ccggacatcc cgaccacatc gccttcgcgc gggaagttgc cgagcaaagc 60 

atggtcttgc tgcaaaaccg tgcgaacctc gccccccttt cggtatctga ctattccacc 120 

attgccgtga tcggcccgaa tgccaatgac actttgctgg gttcttacag cggcgttccg 180 

aaaacctacfc acacggtact cgacgggata cggtcctatg tcggtgaccg ggcgaatgtg 240 

gtttacgctc aggggccgaa gataaccaaa cccggccatc gggaggacaa tgaagtattt 300 

ccaccggatc ctgaaaacga ccggagacga ctggccgaag cgatagctgt cgccgagaac 360 

gccgacctga tcatcctcgc gatcggcggc aatgaactga cgggacgaga ggcatgggcg 420 

gcgcatcatc ccggtgatcg accggatctg tcgttgctcg gtttgcagga ggatcttgtt 480 

gacgcagttg gagcgatggg ggttccatct gtcgcattgg ttttcggtgc acggccgctg 540 

gacctcggca atgtcgccga aaaaattgat gtggtcttcc aaaactggta cctgggccag 600 

gaaaccggca atgccgtcgc caatgtgctg tttggcgagg tgtcaccgtc cgccaaactc 660 

cccatcagct tcccgcggac tgccgggcac attcctgcct actacaatta caaaccatcg 720 

gctcgacggg tctacctttt tgacgatgtc actccgcgtt accatttcgg gtacggcctc 780 

agctatacga cgtttgaata cggggaaccg cagctatcgg atacactact gtctggcgat 840 

ggtgaaataa ccctctacgt tgaagttacc aacaccggag agcgaggcgg ttcggaagtc 900 

gtgcaactgt acatcaacca cgaatacaga tccgtcaccc ggccggtaaa ggagctcaag 960 

ggattcgaaa aggtgtatct cgagccgaat gaaactgccg gtgtatcgtt caccatcact 1020 

tcagatcagt tgaggttctg gaatatcgac atggagttta ccgctgaatc cggtaaagtg 1080 

aacctgatgg tcggctcatc cagccgtgac gaagacctgc agacgacggc aatttttctt 1140 

gaataa ^^^^ 
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<210> 


74 




<211> 


381 




<212> 


PRT 






Unknovm. 




<220> 






<223> 


Obtained from environmental sample 


<ZzO> 






<221> 


DOMAIN 




<222> 


(21) . . . (264) 




<223> 


Glycosyl hydrolase family 3 C 


terminal domain 








<221> 


SITE 




<222> 


(49) . . . (52) 




<223> 


N-glycosylation site. Prosite 


id = PSOOOOl 


<220> 






<221> 


SITE 




<222> 


(335) . . . (338) 




<223> 


N-glycosylation site. Prosite 


id = PSOOOOl 


<400> 


74 




Met Asp He Thr Gly His Pro Asp His 


He Ala Phe Ala 


1 


5 


10 



15 



Ala Glu Gin Ser Met Val Leu Leu Gin Asn Arg Ala Asn Leu Ala Pro 
20 25 30 



Leu Ser Val Ser Asp Tyr Ser Thr He Ala Val He Gly Pro Asn Ala 
35 40 45 



Asn Asp Thr Leu Leu Gly Ser Tyr Ser Gly Val Pro Lys Thr Tyr Tyr 
50 55 60 



Thr Val Leu Asp Gly He Arg Ser Tyr Val Gly Asp Arg Ala Asn Val 
65 70 75 80 



Val Tyr Ala Gin Gly Pro Lys He Thr Lys Pro Gly His Arg Glu Asp 
85 90 95 



Asn Glu Val Phe Pro Pro Asp Pro Glu Asn Asp Arg Arg Arg Leu Ala 
100 105 110 



Glu Ala He Ala Val Ala Glu Asn Ala Asp Leu He He Leu Ala He 
115 120 125 



Gly Gly Asn Glu Leu Thr Gly Arg Glu Ala Trp Ala Ala His His Pro 
130 135 140 



Gly Asp Arg Pro Asp Leu Ser Leu Leu Gly Leu Gin Glu Asp Leu Val 
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145 150 155 160 



Asp Ala Val Gly Ala Met Gly Val Pro Ser Val Ala Leu Val Phe Gly 
165 170 175 



Ala Arg Pro Leu Asp Leu Gly Asn Val Ala Glu Lys lie Asp Val Val 
180 185 190 



Phe Gin Asn Trp Tyr Leu Gly Gin Glu Thr Gly Asn Ala Val Ala Asn 
195 200 205 



Val Leu Phe Gly Glu Val Ser Pro Ser Ala Lys Leu Pro lie Ser Phe 
210 215 220 



Pro Arg Thr Ala Gly His He Pro Ala Tyr Tyr Asn Tyr Lys Pro Ser 
225 230 235 240 



Ala Arg Arg Val Tyr Leu Phe Asp Asp Val Thr Pro Arg Tyr His Phe 
245 250 255 



Gly Tyr Gly Leu Ser Tyr Thr Thr Phe Glu Tyr Gly Glu Pro Gin Leu 
260 265 270 



Ser Asp Thr Leu Leu Ser Gly Asp Gly Glu He Thr Leu Tyr Val Glu 
275 280 285 



Val Thr Asn Thr Gly Glu Arg Gly Gly Ser Glu Val Val Gin Leu Tyr 
290 295 300 



He Asn His Glu Tyr Arg Ser Val Thr Arg Pro Val Lys Glu Leu Lys 
305 310 315 320 



Gly Phe Glu Lys Val Tyr Leu Glu Pro Asn Glu Thr Ala Gly Val Ser 
325 330 335 



Phe Thr He Thr Ser Asp Gin Leu Arg Phe Trp Asn He Asp Met Glu 
340 345 350 



Phe Thr Ala Glu Ser Gly Lys Val Asn Leu Met Val Gly Ser Ser Ser 
355 360 365 



Arg Asp Glu Asp Leu Gin Thr Thr Ala He Phe Leu Glu 
370 375 380 



<210> 75 
<211> 1014 
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<212> DNA 
<213> Unknown 

<220> 

<2 23> Obtained from environmental sample 
<400> 75 



atgctgcgca 


agttgatcgt 


ctcggtcttc 


ggcttcgtca 


tgctgactag 


tgcggcagcg 


60 


gcgcagactc 


ctcccgcctt 


agcggaatcc 


gcgcctgctc 


teeggegcgg 


aatgaacgtt 


120 


ctgggctacg 


acccaatctg 


gcacgacccg 


aagaaaggtc 


ggttcgaaga 


geggcactte 


180 


gccgagattc 


gcaagggcgg 


cttcgacttc 


gttcgggtga 


aectecacgg 


gtteaaacat 


240 


atgaacgccg 


cggacaaact 


cagtccggag 


t tec t gage c 


gcgtggactg 


gatcgtgaag 


300 


cacgccagtg 


cggcgggcct 


gtcggtcatc 


etagacgagc 


atgaatatga 


ggaatgctcg 


360 


gacgacgtcg 


caatgtgccg 


gcggcgtttg 


gcggcattct 


ggacgcaggt 


cgcgccgegc 


420 


tacaagggcg 


cgcccgatac 


ggttctgttc 


gagcttctca 


atgagccgea 


egacaagttg 


480 


gatgccgaca 


cctggaacgc 


cttgtttccc 


gaeatectgg 


eeategtgeg 


gcagtegaae 


540 


Gcgaagcgcc 


gcgtggtgat 


cggcccgact 


eagtggaaea 


aettcageca 


gctggacacg 


600 


ctcaagctgc 


cggcagacga 


ccggaacatc 


gtcgteacet 


tceattatta 


cgatccgtte 


660 


ccgtttaccc 


accagggcgc 


gccgtgggtt 


eeggacatge 


tcaaggtgaa 


aggcatcgag 


720 


tggaagcccg 


agcagagggc 


gaagatcgcc 


gaggacttcg gcaaggtcgc 


ggaatggtcg 


780 


cagaaaaccg 


gccgcgaaat 


cttgctcggc 


gagttcgggg 


cctacgatgt 


gageggtacg 


840 


ccaaccgcca 


tgcgttcagc 


ttatacggaa 


gcggtggcge 


gcgaggcgga 


acgecaeggc 


900 


ttcgcttggg 


cctactggca 


gttcgacagc 


aatttcctgg cttgggacat 


gaagacaaae 


960 


ggctgggtcg 


agccgatcca 


caaggcactc 


atccccgagg 


cgaagcagcc 


ttag 


1014 



<210> 76 
<211> 337 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (37) . . . (316) 

<223> Cellulase (glycosyl hydrolase family 5) 
<220> 

<221> SITE 

<222> (150) . . . (159) 

<223> Glycosyl hydrolases family 5 signature. Prosite id = PS00659 
<400> 76 

Met Leu Arg Lys Leu lie Val Ser Val Phe Gly Phe Val Met Leu Thr 



wo 2006/101584 



119/248 



PCT/US2006/002516 



10 



15 



Ser Ala Ala Ala Ala Gin Thr Pro Pro Ala Leu Ala Glu Ser Ala Pro 
20 25 30 



Ala Leu Arg Arg Gly Met Asn Val Leu Gly Tyr Asp Pro lie Trp His 
35 40 45 



Asp Pro Lys Lys Gly Arg Phe Glu Glu Arg His Phe Ala Glu lie Arg 
50 55 60 



Lys Gly Gly Phe Asp Phe Val Arg Val Asn Leu His Gly Phe Lys His 
65 70 75 80 



Met Asn Ala Ala Asp Lys Leu Ser Pro Glu Phe Leu Ser Arg Val Asp 
85 90 95 



Trp lie Val Lys His Ala Ser Ala Ala Gly Leu Ser Val lie Leu Asp 
100 105 110 



Glu His Glu Tyr Glu Glu Cys Ser Asp Asp Val Ala Met Cys Arg Arg 
115 120 125 



Arg Leu Ala Ala Phe Trp Thr Gin Val Ala Pro Arg Tyr Lys Gly Ala 
130 135 140 



Pro Asp Thr Val Leu Phe Glu Leu Leu Asn Glu Pro His Asp Lys Leu 
145 150 155 160 



Asp Ala Asp Thr Trp Asn Ala Leu Phe Pro Asp lie Leu Ala lie Val 
165 170 175 



Arg Gin Ser Asn Pro Lys Arg Arg Val Val lie Gly Pro Thr Gin Trp 
180 185 190 



Asn Asn Phe Ser Gin Leu Asp Thr Leu Lys Leu Pro Ala Asp Asp Arg 
195 200 205 



Asn lie Val Val Thr Phe His Tyr Tyr Asp Pro Phe Pro Phe Thr His 
210 215 220 



Gin Gly Ala Pro Trp Val Pro Asp Met Leu Lys Val Lys Gly He Glu 
225 230 235 240 



Trp Lys Pro Glu Gin Arg Ala Lys He Ala Glu Asp Phe Gly Lys Val 
245 250 255 
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Ala Glu Trp Ser Gin Lys Thr Gly Arg Glu lie Leu Leu Gly Glu Phe 
260 265 270 

Gly Ala Tyr Asp Val Ser Gly Thr Pro Thr Ala Met hrg Ser Ala Tyr 
275 280 285 

Thr Glu Ala Val Ala Arg Glu Ala Glu Arg His Gly Phe Ala Trp Ala 
290 295 300 

Tyr Trp Gin Phe Asp Ser Asn Phe Leu Ala Trp Asp Met Lys Thr Asn 
305 310 315 320 

Gly Trp Val Glu Pro lie His Lys Ala Leu He Pro Glu Ala Lys Gin 
325 330 335 

Pro 



<210> 77 
<211> 1125 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 77 

atgaaaagga aacgggtttt tattcattct ctaatcgtat tttttttaat gattggttct 60 

tttacttctt gtggatcagt cgccgatgat gccgaagaag ggtttgatat ttttagagga 120 

accaatatcg ctcattggtt atcacaaagt aatgcaaggg gcgaagagcg aaaaaatttc 180 

tttaccgaaa atgatataaa atttattgct gatgctggtt ttgatcatat tcgtttgcca 240 

attgacgagg ttcatttctg ggatgagaat atgaaccggc accaagatgc atttgatctt 300 

atgcatgact gtattaagtg gtcagagaaa catggtctta gggttgtagt ggatttgcat 360 

attattcgtt cacattattt tgttggagat gataatacac tatgggatga aagacatgaa 420 

caggaaaagt ttgttgatat ttggatggag ttatcatctg aactatctca atattcaaac 480 

tcattagtag cttatgagtt aatgaatgaa cctgtagccc cttctcatga tgattggaat 540 

agtttggttg cggaaactat agaggcaatt cgtaaagttg aacctgagag atatattgta 600 

gttggatcaa atatgtggca aggtattgat acatttgagt atttggaagt tcccgaaaat 660 

gatgatagaa taattcttag ttttcatttt tatgatccct ttattttgac tcattatact 720 

gcatcttggg ggtatttaag agattactca gggcctgtta actatccggg atatcttgtt 780 
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acaaatgacc 


agctgttgga 


tatgtcaaac 


gaaatgcaaa agttaattag ggagtttcag 


840 


acaaattttg 


atatttatac 


cattgaagaa 


ctgatatcta ttccatatag tattgcaaag 


900 


gaaaaagggt 


tgaaattata 


ttgtggagag 


tttggtgcaa ttgatcaggc tccaagagat 


960 


gcgagattgg 


catggtacag 


agatgttgtt 


caggtttttg agcgatatgg tatagctcat 


1020 


gccaactgga 


attacaaaga 


ttatggtacg 


tttgggataa agaactatag cgaggagata 


1080 


gatcaggaac 


tgtttgaaat 


cttaattgga 


acaaaacata aatag 


1125 



<210> 78 
<211> 374 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAL 
<222> (1) . . . {28) 

<220> 

<221> DOMAIN 
<222> (25) . . . (353) 

<223> Cellulase (glycdsyl hydrolase family 5) 
<220> 

<221> SITE 

<222> (165) . . . (174) 

<223> Glycosyl hydrolases family 5 signature. Prosite id = PS00659 
<220> 

<221> SITE 

<222> (360) . . , (363) 

<223> N-glycosylation 'site. Prosite id = PSOOOOl 
<400> 78 

Met Lys Arg Lys Arg Val Phe lie His Ser Leu He Val Phe Phe Leu 
15 10 15 



Met He Gly Ser Phe Thr Ser Cys Gly Ser Val Ala Asp Asp Ala Glu 
20 25 30 



Glu Gly Phe Asp He Phe Arg Gly Thr Asn He Ala His Trp Leu Ser 
35 40 45 



Gin Ser Asn Ala Arg Gly Glu Glu Arg Lys Asn Phe Phe Thr Glu Asn 
50 55 60 



Asp He Lys Phe He Ala Asp Ala Gly Phe Asp His He Arg Leu Pro 
65 70 75 80 



He Asp Glu Val His Phe Trp Asp Glu Asn Met Asn Arg His Gin Asp 
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85 90 95 



Ala Phe Asp Leu Met His Asp Cys lie hys Trp Ser Glu Lys His Gly 
100 105 110 



Leu Arg Val Val Val Asp Leu His He He Arg Ser His Tyr Phe Val 
115 120 125 



Gly Asp Asp Asn Thr Leu Trp Asp Glu Arg His Glu Gin Glu Lys Phe 
130 135 140 



Val Asp He Trp Met Glu Leu Ser Ser Glu Leu Ser Gin Tyr Ser Asn 
145 150 155 160 



Ser Leu Val Ala Tyr Glu Leu Met Asn Glu Pro Val Ala Pro Ser His 
165 170 175 



Asp Asp Trp Asn Ser Leu Val Ala Glu Thr He Glu Ala He Arg Lys 
180 185 190 



Val Glu Pro Glu Arg Tyr He Val Val Gly Ser Asn Met Trp Gin Gly 
195 200 205 



He Asp Thr Phe Glu Tyr Leu Glu Val Pro Glu Asn Asp Asp Arg He 
210 215 220 



He Leu Ser Phe His Phe Tyr Asp Pro Phe He Leu Thr His Tyr Thr 
225 230 235 240 



Ala Ser Trp Gly Tyr Leu Arg Asp Tyr Ser Gly Pro Val Asn Tyr Pro 
245 250 255 



Gly Tyr Leu Val Thr Asn Asp Gin Leu Leu Asp Met Ser Asn Glu Met 
260 265 270 



Gin Lys Leu He Arg Glu Phe Gin Thr Asn Phe Asp He Tyr Thr He 
275 280 285 



Glu Glu Leu He Ser He Pro Tyr Ser He Ala Lys Glu Lys Gly Leu 
290 ■ 295 300 



Lys Leu Tyr Cys Gly Glu Phe Gly Ala He Asp Gin Ala Pro Arg Asp 
305 310 315 320 



Ala Arg Leu Ala Trp Tyr Arg Asp Val Val Gin Val Phe Glu Arg Tyr 
325 330 335 
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Gly lie Ala His Ala Asn Trp Asn Tyr Lys Asp Tyr Gly Thr Phe Gly 
340 345 350 

lie hys Asn Tyr Ser Glu Glu lie Asp Gin Glu Leu Phe Glu lie Leu 
355 360 365 

He Gly Thr Lys His Lys 
370 

<210> 79 
<211> 1017 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 



<400> 79 



atgaaatata 


aagctatttt 


tatatacctt 


attgttttga 


ttctatttta 


ctcaat taat 


60 


atttatgcta 


atgcagaaaa 


caaccccctc 


cccttcctca 


gtgtcgaagg 


aaacaggttc 


120 


gtcgatgaag 


atggaaatac 


ggtaatcctg 


cgaggtgtat 


cgttccccga 


tcccgaccgg 


180 


ctggctgagg 


caactcaatg 


gaacaagcga 


tacttccagg 


cggcaaaaga 


ctggaactgt 


240 


aatgtcgtcc 


ggattcctgt 


ccatccacag 


aaatggcggg 


aaagaggcga 


ggaaaattat 


300 


ctgaaacttt 


tagataaggg 


aattcaatgg 


gcgggtgaac 


tcgggatgta 


tgtaatcatc 


360 


gactggcata 


ccatcggtaa 


tccgataacc 


gaagtatttt 


tccgcgaact 


atacaatacg 


420 


tcacgtgcgg 


agaccttcca 


gttctggaga 


acaatcgctg 


agcgctatgc 


cggtaacccg 


480 


gttgttgctt 


tctatgaact 


gttcaatgaa 


ccgaccgact 


acaacggccg 


tctcggaaga 


540 


atgaactggg 


atcagtataa 


agagtttatc 


gaggagataa 


ttcacatcat 


ctattctcac 


600 


gacgatacag 


ttatccctct 


cgttgccggt 


ttcgactggg 


cgtatgaact 


ccgccatata 


660 


aaagataaac 


ctatagattt 


tcccggcatc 


gcttatgtga 


ctcaccccta 


tccccagaaa 


720 


cgcgatccgc 


catgggaaga 


aaaatgggaa 


gaggatttcg 


ggtttgccgc 


cgatatgtat 


780 


ccggtgtttg 


caaccgagtt 


cggtttcatg 


ggggaggatg 


aattaggtgc 


acacataccc 


840 


gtcatcggcg 


atgaaacata 


cggcgaagcc 


attatcgatt 


acttttataa 


aaaggggata 


900 


tcgtggactg 


catgggtatt 


cgatccgctt 


tggtcgccgc 


agcttattag 


agactggtat 


960 


tttaccccgt 


cccgacaggg 


gcagtttttt 


aaagagaagt 


tgatggagtt 


gaattag 


1017 



<210> 80 
<:211> 338 
<2X2> PRT 
<213> Unknown 
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<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAL 
<222> (1) . . . (25) 

<220> 

<221> DOMAIN 
<222> (38) . . . (315) 

<223> Cellulase (glycosyl hydrolase family 5) 
<220> 

<221> SITE 

<222> (141) . . . (144) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 80 

Met Lys Tyr Lys Ala lie Phe lie Tyr Leu lie Val Leu He Leu Phe 
15 10 15 



Tyr Ser He Asn He Tyr Ala Asn Ala Glu Asn Asn Pro Leu Pro Phe 
20 25 30 



Leu Ser Val Glu Gly Asn Arg Phe Val Asp Glu Asp Gly Asn Thr Val 
35 40 45 



He Leu Arg Gly Val Ser Phe Pro Asp Pro Asp Arg Leu Ala Glu Ala 
50 55 60 



Thr Gin Trp Asn Lys Arg Tyr Phe Gin Ala Ala Lys Asp Trp Asn Cys 
65 70 75 80 



Asn Val Val Arg He Pro Val His Pro Gin Lys Trp Arg Glu Arg Gly 
85 90 95 



Glu Glu Asn Tyr Leu Lys Leu Leu Asp Lys Gly He Gin Trp Ala Gly 
100 105 110 



Glu Leu Gly Met Tyr Val He He Asp Trp His Thr He Gly Asn Pro 
115 120 125 



He Thr Glu Val Phe Phe Arg Glu Leu Tyr Asn Thr Ser Arg Ala Glu 
130 135 140 



Thr Phe Gin Phe Trp Arg Thr He Ala Glu Arg Tyr Ala Gly Asn Pro 
145 150 155 160 



Val Va.1 Ala Phe Tyr Glu Leu Phe Asn Glu Pro Thr Asp Tyr Asn Gly 
165 170 175 
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Arg Leu Gly Arg Met Asn Trp Asp Gin Tyr Lys Glu Phe lie Glu Glu 
180 185 190 



lie lie His lie lie Tyr Ser His Asp Asp Thr Val lie Pro Leu Val 
195 200 205 



Ala Gly Phe Asp Trp Ala Tyr Glu Leu Arg His lie Lys Asp Lys Pro 
210 215 220 



lie Asp Phe Pro Gly He Ala Tyr Val Thr His Pro Tyr Pro Gin Lys 
225 230 235 240 



Arg Asp Pro Pro Trp Glu Glu Lys Trp Glu Glu Asp Phe Gly Phe Ala 
245 250 255 



Ala Asp Met Tyr Pro Val Phe Ala Thr Glu Phe Gly Phe Met Gly Glu 
260 265 270 



Asp Glu Leu Gly Ala His He Pro Val He Gly Asp Glu Thr Tyr Gly 
275 280 285 



Glu Ala He He Asp Tyr Phe Tyr Lys Lys Gly He Ser Trp Thr Ala 
290 295 300 



Trp Val Phe Asp Pro Leu Trp Ser Pro Gin Leu He Arg Asp Trp Tyr 
305 310 315 320 



Phe Thr Pro Ser Arg Gin Gly Gin Phe Phe Lys Glu Lys Leu Met Glu 
325 330 335 



Leu Asn 



<210> 81 
<211> 1119 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 81 

atgaatttac ttgctcaata cttttccgga ctatttctga tttttttgat ctcaattttt 60 
ttcgttagtt ctgcagcgaa tcatcattat gaaaaaaata cagtcaacga attttctgat 120 



gatgtaaatc aaacaacatt agtccttcaa cccgggatat ccgaagccca gaatactcaa 180 
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aacctgccgc 


ggatttcggt 


tgaaggaaac 


caatttgtgg atgaatcggg 


aaacacagtc 


240 


acatttcagg 


gtgtcagtgt 


tgccgatccg 




di^dciL^yv^^yy 


ccaatggaaa 


300 


cgggaactgt 


ttgaagaaat 


cgcaaactgg 


y y w-y ^d&^vy 


wwy oLiv^y isLfL. 


gcccatacac 


360 


ccgctctggt 


ggcgggaacg 


gggagaggag 


L'CtCtL'CL^^UOy 


3 t" rrrra ^~ i" 
ci.duyydL.L.yd 


tgaagccgtg 


420 


gagtgggcca 


aagagctgga 


gatgtacctc 




yy k^ct^cty udu 


cgggaacctg 


480 


cggacagaac 


tctttttcag 


ggatatctac 


CL n ci v.* ci ^ Va> 


y u>^y i^yddct-v^ 


ttatgaattc 


540 


tggaggctga 


tttcggatcg 


ctatgctgat 


y ci. d d ^ L> d ^ d ct 


¥• cif^ t" t* t" "H 5^ 


cgaaatcttt 


600 


aatgaaccca 


cacggcagca 


gggcaggctg 


^^dd^v^duy d 


L.y y ddy t-d 


atggaaggaa 


660 


attctaaccg 


acattatcac 


aatcatttat 




^ u y ci L. y v^y d l. 


tccgctggta 


720 


gcaggtttta 


actgggcgta 


tgaccttact 




ex d *w v_ i_. 


cgattttgaa 


780 


ggtattgcct 


atgttaccca 


cccatatccg 


caaaaaagaa 


gcaggccctg 


ggttccaaaa 


840 


tgggaagaag 


atttcggttt 


tgtggctgac 


aaatatcctg 


tatttgccac 


tgaattcggc 


900 


tatatgaggg 


agtatgagcg 


gggcgctcat 


gtgcccgtaa 


tcggggacga 


agaatatggg 


960 


gaaatcctca 


tcaattattt 


ccgcgaaaaa 


gggatttcgt 


ggacagcctg 


ggtattcgat 


1020 


ccaagctggt 


cgccacagct 


cattcaggat 


tgggattata 


cacccacacg 


ctcaggtgag 


1080 


tttttcagaa 


atgcgatgag 


aacgaaaaac 


aatgaataa 






1119 



<210> 82 
<211> 372 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAL 
<222> (1) . . . (25) 

<a20> 

<221> DOMAIN 
<222> (70) . . . (347) 

<223> Cellulase (glycosyl hydrolase family 5) 
<220> 

<221> SITE 

<222> (43)... (46) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (173) , . . (176) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 82 

Met Asn Leu Leu Ala Gin Tyr Phe Ser Gly Leu Phe Leu lie Phe Leu 



wo 2006/101584 



127/248 



PCT/US2006/002516 



10 



15 



lie Ser He Phe Phe Val Ser Ser Ala Ala Asn His His Tyr Glu Lys 
20 25 30 



Asn Thr Val Asn Glu Phe Ser Asp Asp Val Asn Gin Thr Thr Leu Val 
35 40 45 



Leu Gin Pro Gly He Ser Glu Ala Gin Asn Thr Gin Asn Leu Pro Arg 
50 55 60 



He Ser Val Glu Gly Asn Gin Phe Val Asp Glu Ser Gly Asn Thr Val 
65 70 75 80 



Thr Phe Gin Gly Val Ser Val Ala Asp Pro His Arg Leu Asn Asn Ala 
85 90 95 



Gly Gin Trp Lys Arg Glu Leu Phe Glu Glu He Ala Asn Trp Gly Ala 
100 105 110 



Asn Val Val Arg Leu Pro He His Pro Leu Trp Trp Arg Glu Arg Gly 
115 , 120 125 



Glu Glu Gin Tyr Leu Glu Trp He Asp Glu Ala Val Glu Trp Ala Lys 
130 135 140 



Glu Leu Glu Met Tyr Leu He He Asp Trp His Ser He Gly Asn Leu 

150 155 160 

Arg Thr Glu Leu Phe Phe Arg Asp He Tyr Asn Thr Thr Arg Arg Glu 

165 170 175 



Thr Tyr Glu Phe Trp Arg Leu He Ser Asp Arg Tyr Ala Asp Glu Thr 
180 185 190 



Thr He Ala Phe Tyr Glu He Phe Asn Glu Pro Thr Arg Gin Gin Gly 
155 200 205 



Arg Leu Gly Thr Met Thr Trp Lys Gin Trp Lys Glu He Leu Thr Asp 
210 215 220 



He He Thr He He Tyr Ala His Asn Pro Asp Ala He Pro Leu Val 
225 230 235 240 



Ala Gly Phe Asn Trp Ala Tyr Asp Leu Thr Pro Val Arg His Ser Pro 
245 250 255 
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Leu Asp Phe Glu Gly lie Ala Tyr Val Thr His Pro Tyr Pro Gin Lys 
250 265 270 



Arg Ser Arg Pro Trp Val Pro Lys Trp Glu Glu Asp Phe Gly Phe Val 
275 280 285 



Ala Asp Lys Tyr Pro Val Phe Ala Thr Glu Phe Gly Tyr Met Arg Glu 
290 295 300 



Tyr Glu Arg Gly Ala His Val Pro Val lie Gly Asp Glu Glu Tyr Gly 
305 310 315 320 



Glu lie Leu lie Asn Tyr Phe Arg Glu Lys Gly He Ser Trp Thr Ala 
325 330 335 



Trp Val Phe Asp Pro Ser Trp Ser Pro Gin Leu He Gin Asp Trp Asp 
340 345 350 



Tyr Thr Pro Thr Arg Ser Gly Glu Phe Phe Arg Asn Ala Met Arg Thr 
355 360 365 



Lys Asn Asn Glu 
370 



<210> 83 
<211> 1089 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 83 



atgagccttg 


gcctgactgc 


aatcgagttg 


atcaatcgcg cccgcgccga 


tctgcgactg 


60 


ggcgtgccga 


tcgttctgcg 


cgagggcgac 


gtgcaggcgc tggtgctggc 


ggtcgagcca 


120 


gtaaccgagg 


cgcggctggg 


tgggctgcgc 


gggctggggc cagggctggt 


gcttgcaatc 


180 


acgcagcgcc 


gcgccacgac 


actgaaggcg 


ogcgcctatg atgaggatct 


tgcgcgagtg 


240 


gtggtgcccg 


agggggtagg 


ctgcgactgg 


ctgcgggcgg tggcggaccc 


ctccgacgat 


300 


ctgcgctttc 


cgatgaaggg 


cccgctgatg 


accgctcgcg agggcacggc 


cgcgctgcat 


360 


cgcgctgcac 


ttcaactggt 


gaaatccgcg 


cagcttcttc cggccgcact 


tgttcagccg 


420 


cttgcggatc 


ccgaggcgct 


gcccgtcacg 


gggctgacag tgctcgatat 


cgccgatgtc 


480 


agccgtgaat 


tggcgcgcga 


gacagtgttg 


tatccagtgg tgcatgcgcg cttgccgatg 


540 



PCTAJS2006/002516 

wo 2006/101584 ^^^^^^^ 



ctggcggcgc aagcgggccg cgtgcatatc ttccgacccc gcgacggcgg cgttgagcat 
tacgccatcg agatcggcca gcccgaccgt gccgcgcccg tgctcacgcg gctgcattcg 
gcctgtttca caggcgatgt gctgggctcg ctcaaatgcg attgcggccc gcaactgcag 
gcagcactcg cgcagatggg cgaggaaggc gcgggggtgc tgctctatct caatcaggag 
ggtcgcggca tcgggcttgc caacaagatg cgcgcctatt cgctgcagga tcagggcttt 
gacacggtcg aggccaatca ccgtctgggg ttcgaggatg acgagcggga tttccgcatc 
ggggccgcgc ttctgcggcg gatggggttc tctcgggcgc ggctgctgac caacaaccct 
cggaaggtga acatgctgaa tgcgcatcgg gtcgaagtgg tggaacgggt gccgcttcgg 
gtgggcgaga cggtcgagaa ccgcgcctat cttgccacca aggccgccaa atccgggcat 
ctgttgtga 

<210> 84 
<211> 362 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 

<222> (168) . . . (339) 

<223> GTP cyclohydrolase II 

<400> 84 ^ ^- 

Met Ser Leu Gly Leu Thr Ala He Glu Leu He Asn Arg Ala Arg Ala 
15 10 15 

Asp Leu Arg Leu Gly Val Pro He- Val Leu Arg Glu Gly Asp Val Gin 
20 25 30 

Ala Leu Val Leu Ala Val Glu Pro Val Thr Glu Ala Arg Leu Gly Gly 
35 40 45 

Leu Arg Gly Leu Gly Pro Gly Leu Val Leu Ala He Thr Gin Arg Arg 
50 55 60 

Ala Thr Thr Leu Lys Ala Arg Ala Tyr Asp Glu Asp Leu Ala Arg Val 
65 70 75 80 

val Val Pro Glu Gly Val Gly Cys Asp Trp Leu Arg Ala Val Ala Asp 
35 90 95 

Pro Ser Asp Asp Leu Arg Phe Pro Met Lys Gly Pro Leu Met Thr Ala 
100 105 110 



600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1089 
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Arg Glu Gly Thr Ala Ala Leu His Arg Ala Ala Leu Gin Leu Val Lys 
115 120 125 



Ser Ala Gin Leu Leu Pro Ala Ala Leu Val Gin Pro Leu Ala Asp Pro 
130 135 140 



Glu Ala Leu Pro Val Thr Gly Leu Thr Val Leu Asp lie Ala Asp Val 
145 150 155 160 



Ser Arg Glu Leu Ala Arg Glu Thr Val Leu Tyr Pro Val Val His Ala 
165 170 175 



Arg Leu Pro Met Leu Ala Ala Gin Ala Gly Arg Val His lie Phe Arg 
180 185 190 



Pro Arg Asp Gly Gly Val Glu His Tyr Ala He Glu He Gly Gin Pro 
195 200 205 



Asp Arg Ala Ala Pro Val Leu Thr Arg Leu His Ser Ala Cys Phe Thr 
210 215 220 



Gly Asp Val Leu Gly Ser Leu Lys Cys Asp Cys Gly Pro Gin Leu Gin 
225 230 235 240 



Ala Ala Leu Ala Gin Met Gly Glu Glu Gly Ala Gly Val Leu Leu Tyr 
245 250 255 



Leu Asn Gin Glu Gly Arg Gly He Gly Leu Ala Asn Lys Met Arg Ala 
260 265 270 



Tyr Ser Leu Gin Asp Gin Gly Phe Asp Thr Val Glu Ala Asn His Arg 
275 280 285 



Leu Gly Phe Glu Asp Asp Glu Arg Asp Phe Arg He Gly Ala Ala Leu 
290 295 300 



Leu Arg Arg Met Gly Phe Ser Arg Ala Arg Leu Leu Thr Asn Asn Pro 
305 310 315 320 



Arg Lys Val Asn Met Leu Asn Ala His Arg Val Glu Val Val Glu Arg 
325 330 335 



Val Pro Leu Arg Val Gly Glu Thr Val Glu Asn Arg Ala Tyr Leu Ala 
340 345 350 
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Thr Lys Ala Ala Lys Ser Gly His Leu Leu 
355 360 



<210> 85 
<211> 1284 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 85 



gtgaacaccg cgcatcgcat 


cgaattccct 


cggcaattta 


tcttcggttc 


cgccactgct 


60 


gctcaccaag tggagggcaa 


caacgttcac 


aatgattggt 


gggcccacga 


gcatgccacc 


120 


gacacgaatg ccgtggagcc 


gtcgggcctc 


gcctgcgacc 


actttcggcg ctttgccgac 


180 


gacttccgcc tcttacgcca 


actcggacag 


ccagcgcacc 


gcctgtcgct 


ggaatggagc 


240 


cgcatcgaac cggcacccgg 


tgaaatcgat 


cgttcggcat 


tgtcccacta 


ccgccgagtc 


300 


ctgggtactt tgcgagacct 


cggaatcgag 


ccatgggtca 


ccatccacca 


cttcacttgc 


360 


cctcgctggt tcgtggaaca 


gggagggttt 


acacgcatgg 


attcagcgcg 


ctctctcgtt 


420 


cgccataccg aacgcgtggc 


gagggagttc 


tccgacctag 


tcacaaactg 


gtgcaccata 


480 


aatgagccaa acgtcgtggc 


agaactcggt 


tatcgcttcg 


gatactttcc 


gccgcggttg 


540 


caggacgatg agctggcagc 


ggaagtgctc 


accaacttct 


ttcgcttaca 


cgctgaaatg 


600 


gcagaagttt tgcgcgctca 


cgcgcagaga 


tcggcgcaaa 


tcggtatcac 


ccttgcgatg 


660 


caagcacacg agccgctgcg 


catcgaaagc 


gaagcggacc 


gcgcactggc 


ggcgcggcgc 


720 


gacgccgaga ccaacggcgt 


catgctcaac 


gccttgcgaa 


ccggtgtatt 


cgcctacccg 


* 780 


ggacgggagc cggtggaaat 


ccctggactg 


aaaacgtcat 


cgaccttcgt 


gggggtccag 


840 


tactattcgc gggtccgcta 


cgacgccgag 


tcgcaaggtc 


cagcaatgcc 


cgacttcgag 


900 


cgcaccctca gccaaatggg 


atgggaggtg 


tatcctgagg 


ggttcggccc 


cttgctcgag 


960 


cgcgcagcag aaactggact 


cgaagtgatc 


gtcacagaga 


acgggatggc 


gcacgacgat 


1020 


gaccgtgtgc gcgtgcgttt 


tatcgccgac 


cacttgcggg 


tcgttcaccg 


ccttctggaa 


1080 


cgcggtgtgc gcatcggagg 


gtacttttac 


tggtcgacca 


tggacaactt 


cgaatggaac 


1140 


ttcgggtacg gaccgaagtt 


cggcctgatc 


gaagtggacc 


gttctaccct 


ggaacgcagg 


1200 


ccgcggcgaa gcgcgtattt 


cttccgtgac 


atgatccagc 


agcgagtgct 


cgacgacgac 


1260 


ctggtcgagc actggactcg 


ctga 








1284 



<210> 86 
<211> 427 
<212> PRT 
<213> Unknown 
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<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (5) . . . (417) 

<223> Glycosyl hydrolase family 1 
<220> 

<221> SITE 

<222> (334) . . . (342) 

<223> Glycosyl hydrolases family 1 active site. Prosite id = PS00572 
<400> 86 

Met Asn Thr Ala His Arg lie Glu Phe Pro Arg Gin Phe He Phe Gly 
15 10 15 



Ser Ala Thr Ala Ala His Gin Val Glu Gly Asn Asn Val His Asn Asp 
20 25 30 



Trp Trp Ala His Glu His Ala Thr Asp Thr Asn Ala Val Glu Pro Ser 
35 40 45 



Gly Leu Ala Cys Asp His Phe Arg Arg Phe Ala Asp Asp Phe Arg Leu 
50 55 60 



Leu Arg Gin Leu Gly Gin Pro Ala His Arg Leu Ser Leu Glu Trp Ser 
65 70 75 80 



Arg He Glu Pro Ala Pro Gly Glu He Asp Arg Ser Ala Leu Ser His 
85 90 95 



Tyr Arg Arg Val Leu Gly Thr Leu Arg Asp Leu Gly He Glu Pro Trp 
100 105 110 



Val Thr He His His Phe Thr Cys Pro Arg Trp Phe Val Glu Gin Gly 
115 120 125 



Gly Phe Thr Arg Met Asp Ser Ala Arg Ser Leu Val Arg His Thr Glu 
130 135 140 



Arg Val Ala Arg Glu Phe Ser Asp Leu Val Thr Asn Trp Cys Thr He 
145 150 155 160 



Asn Glu Pro Asn Val Val Ala Glu Leu Gly Tyr Arg Phe Gly Tyr Phe 
165 170 175 



Pro Pro Arg Leu Gin Asp Asp Glu Leu Ala Ala Glu Val Leu Thr Asn 
180 185 190 
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Phe Phe Arg Leu His Ala Glu Met Ala Glu Val Leu Arg Ala His Ala 
195 200 205 



Gin Arg Ser Ala Gin He Gly He Thr Leu Ala Met Gin Ala His Glu 
210 215 220 



Pro Leu Arg He Glu Ser Glu Ala Asp Arg Ala Leu Ala Ala Arg Arg 
225 230 235 240 



Asp Ala Glu Thr Asn Gly Val Met Leu Asn Ala Leu Arg Thr Gly Val 
245 250 255 



Phe Ala Tyr Pro Gly Arg Glu Pro Val Glu He Pro Gly Leu Lys Thr 
260 265 270 



Ser Ser Thr Phe Val Gly Val Gin Tyr Tyr Ser Arg Val Arg Tyr Asp 
275 280 285 



Ala Glu Ser Gin Gly Pro Ala Met Pro Asp Phe Glu Arg Thr Leu Ser 
290 295 300 



Gin Met Gly Trp Glu Val Tyr Pro Glu Gly Phe Gly Pro Leu Leu Glu 
305 310 315 320 



Arg Ala Ala Glu Thr Gly Leu Glu Val He Val Thr Glu Asn Gly Met 
325 330 335 



Ala His Asp Asp Asp Arg Val Arg Val Arg Phe He Ala Asp His Leu 
340 345 350 



Arg Val Val His Arg Leu Leu Glu Arg Gly Val Arg He Gly Gly Tyr 
355 360 365 



-Phe Tyr Trp Ser Thr Met Asp Asn Phe Glu Trp Asn Phe Gly Tyr Gly 
370 375 380 



Pro Lys Phe Gly Leu He Glu Val Asp Arg Ser Thr Leu Glu Arg Arg 
385 390 395 400 



Pro Arg Arg Ser Ala Tyr Phe Phe Arg Asp Met He Gin Gin Arg Val 
405 410 415 



Leu Asp Asp Asp Leu Val Glu His Trp Thr Arg 
420 425 
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<210> 87 
<211> 1167 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 87 



atgagaaaga 


gtgtgttcac 


cctcgccgtg 


tttttgtcgg 


cactgtttgc 


attcacgtct 


60 


tgtcagaaca 


agagccagaa 


cgaggctcaa 


gaccaggcag 


gacaagtcaa 


taacttccgc 


120 


atcaagcgcg 


gcacgaacat 


cagccactgg 


ctgtcgcagt 


cggagcagcg 


cggtgaggct 


180 


cgcagactgc 


atatccagga 


ggacgacttc 


gcccgtctgg 


aagagctggg 


cttcgacttc 


240 


gtgcgcatcc 


ccatcgacga 


ggtgcagttc 


tgggacgagc 


agggcaacaa 


gctgcccgag 


300 


gcgtgggatc 


tgctgaacaa 


cgccctcgac 


tggagcaaga 


agcacaacct 


gcgtgccatc 


360 


gtcgacctgc 


acatcatccg 


tgcgcactat 


ttcaatgccg 


tgaatgaggc 


agaccaggcc 


420 


gccaataccc 


tcttcacctc 


tgaggaggca 


caggaaggac 


tccttaacct 


gtggcgccag 


480 


ctctccgagt 


tcctgaagga 


ccgcagcaac 


gactgggtgg cctacgagtt 


catgaacgag 


540 


ccggtagccc 


ctgagcacga 


gatgtggaac 


cagctggtag ccaaggtaca 


caaggccctg 


600 


cgcgaactgg 


aaccccagcg 


tacactcgtc 


gtcggctcga 


acatgtggca 


gggacacgag 


660 


acgatgaagt 


atctgaaagt 


gcccgagggc 


gataagaaca 


tcatcctctc 


gttccactac 


720 


tacaacccga 


tgctgctgac 


gcactacggt 


gcctggtggt 


cgccgctgtg 


tgctgcctac 


780 


aagggtaagg 


tgaactatcc 


cggtgtgctc 


gtgtcgaagg 


aagactacga 


tgccgctcct 


840 


gctgccatca 


aggatcagct 


gaagcccttt 


accgaggaag 


tatggaacat 


cgacaagatc 


900 


cgtgagcagt 


tcaaggatgc 


catcgaggcc 


gccaagaaat 


atgacctgca 


actgttctgc 


960 


ggcgagtggg 


gtgtctatga 


gcccgtggac 


cgtgagctgg 


cctacaaatg 


gtatcgtgac 


1020 


gtgctgacgg 


tgttcgacga 


gttcaacatc 


gcctggacga 


cctggtgcta 


cgatgctgac 


1080 


ttcggtttct 


gggatcagca 


gcgccactgc 


tacaaagact 


atccgctggt 


ggagctcctg 


1140 


atgtcaggaa 


agaaactggg 


agaatag 








1167 



<210> 88 
<211> 388 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAL 
<222> (1) . . . (23) 
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<220> 

<221> DOMAIN 
<222> (48) . . . (365) 

<2 23> Cellulase (glycosyl hydrolase family 5) 
<220> 

<221> SITE 

<222> (23) . , . (26) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (46) . . . (49) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 88 

Met Arg Lys Ser Val Phe Thr Leu Ala Val Phe Leu Ser Ala Leu Phe 
15 10 15 



Ala Phe Thr Ser Cys Gin Asn Lys Ser Gin Asn Glu Ala Gin Asp Gin 
20 25 30 



Ala Gly Gin Val Asn Asn Phe Arg He Lys Arg Gly Thr Asn lie Ser 
35 40 45 



His Trp Leu Ser Gin Ser Glu Gin Arg Gly Glu Ala Arg Arg Leu His 
50 55 60 



He Gin Glu Asp Asp Phe Ala Arg Leu Glu Glu Leu Gly Phe Asp Phe 
65 70 75 80 



Val Arg He Pro He Asp Glu Val Gin Phe Trp Asp Glu Gin Gly Asn 
85 90 95 



Lys Leu Pro Glu Ala Trp Asp Leu Leu Asn Asn Ala Leu Asp Trp Ser 
100 105 110 



Lys Lys His Asn Leu Arg Ala He Val Asp Leu His He He Arg Ala 
115 120 125 



His Tyr Phe Asn Ala Val Asn Glu Ala Asp Gin Ala Ala Asn Thr Leu 
130 135 140 



Phe Thr Ser Glu Glu Ala Gin Glu Gly Leu Leu Asn Leu Trp Arg Gin 
145 150 155 160 



Leu Ser Glu Phe Leu Lys Asp Arg Ser Asn Asp Trp Val Ala Tyr Glu 
165 170 175 
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Phe Met Asn Glu Pro Val Ala Pro Glu His Glu Met Trp Asn Gin Leu 
180 185 190 



Val Ala Lys Val His Lys Ala Leu Arg Glu Leu Glu Pro Gin Arg Thr 
195 200 205 



Leu Val Val Gly Ser Asn Met Trp Gin Gly His Glu Thr Met Lys Tyr 
210 215 220 



Leu Lys Val Pro Glu Gly Asp Lys Asn lie He Leu Ser Phe His Tyr 
225 230 235 240 



Tyr Asn Pro Met Leu Leu Thr His Tyr Gly Ala Trp Trp Ser Pro Leu 
245 250 255 



Cys Ala Ala Tyr Lys Gly Lys Val Asn Tyr Pro Gly Val Leu Val Ser 
260 265 270 



Lys Glu Asp Tyr Asp Ala Ala Pro Ala Ala He Lys Asp Gin Leu Lys 
275 280 285 



Pro Phe Thr Glu Glu Val Trp Asn He Asp Lys He Arg Glu Gin Phe 
290 295 300 



Lys Asp Ala He Glu Ala Ala Lys Lys Tyr Asp Leu Gin Leu Phe Cys 
305 310 315 320 



Gly Glu Trp Gly Val Tyr Glu Pro Val Asp Arg Glu Leu Ala Tyr Lys 
325 330 335 



Trp Tyr Arg Asp Val Leu Thr Val Phe Asp Glu Phe Asn He Ala Trp 
340 345 350 



Thr Thr Trp Cys Tyr Asp Ala Asp Phe Gly Phe Trp Asp Gin Gin Arg 
355 360 365 



His Cys Tyr Lys Asp Tyr Pro Leu Val Glu Leu Leu Met Ser Gly Lys 
370 375 380 



Lys Leu Gly Glu 
385 



<210> 89 
<211> 1500 
<212> DNA 
<2a.3> Unknown 
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<223> Obtained from environmental sample 



<400> 89 



^5 C7r^ 3 ^9 r» rr t" 1- 


t-dy i-v^ uc ud u 


cuccdtcgca 


cguccatcaa 


tgacagtatt aacaattagc 


60 


yy ^y ^^y^yy 


i.^ d u dy ddy L. 


dL.ct-y cagca 


ggggcgaaaa 


cgcctgtagc ccttaatggc 


120 


L'CLy ^ u uci.y L>cL 


L- Udddggcac 


tcagctagtc 


aatcaaaacg 


gaaaaccggt gcagctgaag 


180 


yyyciL.ca.ycc- 


cacacggttt 


gcagtggttc 


ggcgattatg 


tcaataaaga cactttaaaa 


■ 240 


1~ ftftr^ a o /T ^ 

t-yycuaayag 


acgattgggg 


aattaccgtc 


ttccgggcgg 


caatgtacac ggctgacggc 


300 


gy t caua tcy 


agaatccgtc 


tgtgaaaaat 


aaagtcaaag 


aagctgttga agcggcaaaa 


360 


y ay C u c y g g a 


tatatgtcat 


cattgactgg 


catattttaa 


atgacggcaa tccaaatcaa 


420 


aataaagaga 


^ggcgaagga 


attctttaag 


gaaatgtcaa 


gcctttacgg aagctcacca 


480 


aacg"ttatat 


atgaaattgc 


taatgaaccg 


aacggtgatg 


taaattggaa gcgcgatatc 


540 


3 a Si /^y-i/^'H -a 4— «■ 

aaaccy uacy 


cggaagaagt 


gatttctgtt 


atccgtaaaa 


atgacccgga taacatcatt 


600 


a. i^civ^oy y cicL 


*-«>gy Cdcccg 


gagccaggat 


gtcaacgatg 


ctgcggatga tcagcttaag 


660 


y CI. i_y v^cLdctL-y 


uGdt.gL.dCgc 


gccccacttc 


tatgccggta 


cacacggcca gtttttaagg 


720 


(A «v C4, <A d y y y 


d^ L.dugcgc c 


cagcaaagga 


gctccgattt 


ttgtaacgga atgggggacg 


780 


w.y i-y doyc* l. l 


ccggaaacgg 


3-^9g9tatac 


cttgaccagt 


ogagggaatg gctgaattat 


840 


^ uwy cn-cty v-^d 


agaaaatcag 


ctgggtaaac 


tggaaccttt 


ctgataagca ggaatcatcc 


900 


i-v*ciy W I* \^ L.CLCL 


dgc cgggggc 


atctaaaaca 


ggcggctggc 


cgttatcaga tttatccgct 


960 


u ciy y y d V- d. L. 


utgcaagaga 


aaacattcgc 


ggctcccaaa 


attcgagtga agacagatct 


1020 




dy oddy dy dd 


acccgcacag 


gaaaacagca 


tctctgtgca atacagaaca 


1080 


ggggatggaa 


gtgtgaacag 


caaccaaatc 


cgtcctcaga 


tcaatgtgaa aaacaacagc 


1140 


aagaccaccg 


ttaacttaaa 


aaatgtaact 


gtccgctact 


ggtataacac gaaaaacaaa 


1200 


ggccaaaact 


tcgactgtga 


ttacgcgaag 


atcggatgca 


gcaatgtgac gcacaagttt 


1260 


gtgacattac 


ataaacctgt 


aaaaggtgca 


gatgcctatc 


tggaacttgg gtttagaaac 


1320 


gggacgctgt 


caccgggagc 


aagcaccgga 


gaaattcaaa 


ttcgtcttca caatgaggac 


1380 


tggagcaatt 


attcacaagc 


cggggattat 


tcttttttcc 


agtcgaatac gtttaaagat 


1440 


acaaaaaaaa 


tcacattata 


taataacgga 


aaactgattt 


ggggaacaga acccaaatag 


1500 



<210> 90 
<211> 499 
<212> PRT 
<213> UnJoiown 

<220> 

<223> Obtained from environmental sample 
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<220> 

<221> SIGNAL 
<222> (1) . . . (29) 

<220> 

<221> DOMAIN 
<222> (47) . . . (301) 

<223> Cellulase (glycosyl hydrolase family 5) 
<220> 

<221> DOMAIN 

<222> (356) . , . (437) 

<223> Cellulose binding domain 

<220> 

<221> SITE 

<222> (164) . . . (173) 

<223> Glycosyl hydrolases family 5 signature. Prosite id = PS00659 
<220> 

<221> SITE 

<222> (296) . . . (299) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (339) . . . (342) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (383) . . . (386) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (393) . . . (396) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (421) . . . (424) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (446) . . . (449) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (470) , . . (473) 

<223> N-glycbsylation site. Prosite id = PSOOOOl 
<400> 90 

Met Lys Arg Ser Val Ser lie Phe lie Ala Cys Leu Leu Met Thr Val 
1 5 10-15 



Leu Thr lie Ser Gly Val Ala Ala Pro Glu Ala Ser Ala Ala Gly Ala 
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20 25 30 



Lys Thr Pro Val Ala Leu Asn Gly Gin Leu Ser lie Lys Gly Thr Gin 
35 * 40 45 



Leu Val Asn Gin Asn Gly Lys Pro Val Gin Leu Lys Gly lie Ser Ser 
50 55 60 



His Gly Leu Gin Trp Phe Gly Asp Tyr Val Asn Lys Asp Thr Leu Lys 
65 70 75 80 



Trp Leu Arg Asp Asp Trp Gly He Thr Val Phe Arg Ala Ala Met Tyr 
85 90 95 



Thr Ala Asp Gly Gly Tyr He Glu Asn Pro Ser Val Lys Asn Lys Val 
100 105 110 



Lys Glu Ala Val Glu Ala Ala Lys Glu Leu Gly He Tyr Val He He 
115 120 125 



Asp Trp His He Leu Asn Asp Gly Asn Pro Asn Gin Asn Lys Glu Lys 
130 135 140 



Ala Lys Glu Phe Phe Lys Glu Met Ser Ser Leu Tyr Gly Ser Ser Pro 
145 150 155 160 



Asn Val He Tyr Glu He Ala Asn Glu Pro Asn Gly Asp Val Asn Trp 
165 170 175 



Lys Arg Asp He Lys Pro Tyr Ala Glu Glu Val He Ser Val He Arg 
180 185 190 



Lys Asn Asp Pro Asp Asn He He He Thr Gly Thr Gly Thr Trp Ser 
195 200 205 



Gin Asp Val Asn Asp Ala Ala Asp Asp Gin Leu Lys Asp Ala Asn Val 
210 215 220 



Met Tyr Ala Leu His Phe Tyr Ala Gly Thr His Gly Gin Phe Leu Arg 
225 230 235 240 



Asp Lys Ala Asp Tyr Ala Leu Ser Lys Gly Ala Pro He Phe Val Thr 
245 250 255 



Glu Trp Gly Thr Ser Asp Ala Ser Gly Asn Gly Gly Val Tyr Leu Asp 
260 265 270 
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Gin Ser Arg Glu Trp Leu Asn Tyr Leu Asp Ser Lys Lys lie Ser Trp 
275 280 285 



Val Asn Trp Asn Leu Ser Asp Lys Gin Glu Ser Ser Ser Ala Leu Lys 
290 295 300 



Pro Gly Ala Ser Lys Thr Gly Gly Trp Pro Leu Ser Asp Leu Ser Ala 
305 310 315 320 



Ser Gly Thr Phe Val Arg Glu Asn lie Arg Gly Ser Gin Asn Ser Ser 
325 330 335 



Glu Asp Arg Ser Glu Thr Pro Lys Gin Glu Lys Pro Ala Gin Glu Asn 
340 345 350 



Ser lie Ser Val Gin Tyr Arg Thr Gly Asp Gly Ser Val Asn Ser Asn 
355 360 365 



Gin He Arg Pro Gin He Asn Val Lys Asn Asn Ser Lys Thr Thr Val 
370 375 380 



Asn Leu Lys Asn Val Thr Val Arg Tyr Trp Tyr Asn Thr Lys Asn Lys 
385 390 395 400 



Gly Gin Asn Phe Asp Cys Asp Tyr Ala Lys He Gly Cys Ser Asn Val 
405 410 415 



Thr His Lys Phe Val Thr Leu His Lys Pro Val Lys Gly Ala Asp Ala 
420 425 430 



Tyr Leu Glu Leu Gly Phe Arg Asn Gly Thr Leu Ser Pro Gly Ala Ser 
435 440 445 



Thr Gly Glu He Gin He Arg Leu His Asn Glu Asp Trp Ser Asn Tyr 
450 455 460 



Ser Gin Ala Gly Asp Tyr Ser Phe Phe Gin Ser 'Asn Thr Phe Lys Asp 
465 470 475 480 



Thr Lys Lys He Thr Leu Tyr Asn Asn Gly Lys Leu He Trp Gly Thr 
485 490 495 



Glu Pro Lys 
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<210> 91 
<211> 1725 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 91 



atgctgaaat 


taagtgataa 


cctaactttc ttgaaaagca aaccattttt 


tcttaatgaa 


60 


aaagaaatga 


agtgggtgga 


gaaaacactt caatccatgt ccttacatga 


aaaagtaggg 


120 


caattatttt 


gtcccattgg 


cggttcagat aataaacaag aattagaagc 


ctttattaag 


180 


gaatatcatc 


ctggcggcat 


catgtaccgt cctaatacag gagcaaaaat 


acaggaaaca 


240 


catcggttgt 


tacaagagct 


atccccggta cctttattaa tttctgctaa 


cttagaggcc 


300 


ggtggtaatg 


ggattgctac 


ggatggtact tacttcggaa agcaaatgca 


ggtggctgca 


360 


acagataatg 


aagaaatggc 


ctataaatta ggattagttg ctggccgtga 


aggccgtgtg 


420 


gccggttgta 


actgggcttt 


tgcaccaatt gttgatattg atatgaacta 


tcgaaaccca 


480 


attacaaacg 


taagaacgta 


tgggtctgac ccaattagag ttgcccaaat 


gtctaaagct 


540 


tttatgaagg 


gaattcatga 


aagcggactc gcagcagctg ttaagcattt 


cccaggggat 


600 


ggagtggatg 


atagagatca 


gcatctttta tcatctgtaa acaccttatc 


taccgaagaa 


660 


tgggatcaaa 


cctttgggat 


ggtttatcaa gaaatgatag acagtggggc 


aaaatcgatt 


720 


atggcgggcG 


atatcatgct 


ccctgaatat tcaagagaac tattgccggg 


tattgaagac 


780 


gaacaaatca 


tgcccgccac 


actagcacca gagttactta atggtttatt 


aagggaaaag 


840 


ttaggtttta 


atggtttaat 


cgtgactgat gcatccccta tgttagggtt 


cactacttcg 


900 


gaaagaagag 


aaattgctgt 


tcctaaggcg attgcttcgg gctgtgatat 


gtttctcttc 


960 


aaccgtaaca 


taaaagaaga 


ttatgagttc atgctgaatg gaattgaaac 


tggaattcta 


1020 


accttggaaa 


gagtagatga 


agctgttact agagtacttg ctcttaaagc 


atctctaggt 


1080 


ctgaatgtac 


aaaaggaatt 


gggaatatta gtacctgaag aagcggaatt 


gtcggtatta 


1140 


caatctgaag 


aacatttgga 


ttgggcaaga gaatgtgcag accaatcggt 


tacattagta 


1200 


aaggatacac 


aaaaactgct 


gcctattagt gctgatcagt ataaacgggt 


tcgactttat 


1260 


gtattgggtg 


atcaagaagg 


agggctaaag gaaggcggct ccgtcactca 


accgtttatc 


1320 


gattctctta 


aaaatgctgg 


ctttgaagta gatttatata atgacaagca 


agttaatttc 


1380 


caagaactgt 


ttatgagtgt 


aaacgagttt aaaaagaact atgatctgat 


catttatgtc 


1440 


gccaaccttg 


aaaccgctag 


taaccaaacg acagtcagaa ttaattggca 


gcagccgcta 


1500 


aatgccaacg 


ctccatggtt 


tgttaaagat ataccgacat tatttatttc 


ggttgctaac 


1560 
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ccataccatc tacaggacgt accaatggtt aagacctata taaatgctta ttcatctaat 1620 

gaatatgtgg tagaagcaat tgtagataaa atcttaggaa aatcagagtt taaagggaag 1680 

aatcccgtcg atccgttttg tgggaaatgg gataccagac tttaa 1725 

<210> 92 
<211> 574 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (87) . . . (320) 

<223> Glycosyl hydrolase family 3 N terminal domain 
<220> 

<221> SITE 
<222> (7)... (10) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (495) . . . (498) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 92 

Met Leu Lys hen Ser Asp Asn Leu Thr Phe Leu Lys Ser Lys Pro Phe 
15 10 15 



Phe Leu Asn Glu Lys Glu Met Lys Trp Val Glu Lys Thr Leu Gin Ser 
20 25 30 



Met Ser Leu His Glu Lys Val Gly Gin Leu Phe Cys Pro lie Gly Gly 
35 40 45 



Ser Asp Asn Lys Gin Glu Leu Glu Ala Phe lie Lys Glu Tyr His Pro 
50 55 60 



Gly Gly lie Met Tyr Arg Pro Asn Thr Gly Ala Lys lie Gin Glu Thr 
65 70 75 80 



His Arg Leu Leu Gin Glu Leu Ser Pro Val Pro Leu Leu lie Ser Ala 
85 90 95 



Asn Leu Glu Ala Gly Gly Asn Gly He Ala Thr Asp Gly Thr Tyr Phe 
100 105 110 



Gly Lys Gin Met Gin Val Ala Ala Thr Asp Asn Glu Glu Met Ala Tyr 
115 120 125 
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Lys Leu Gly Leu Val Ala Gly Arg Glu Gly Arg Val Ala Gly Cys Asn 
130 135 140 



Trp Ala Phe Ala Pro He Val Asp He Asp Met Asn Tyr Arg Asn Pro 
145 150 155 160 



lie Thr Asn Val Arg Thr Tyr Gly Ser Asp Pro He Arg Val Ala Gin 
165 170 175 



Met Ser Lys Ala Phe Met Lys Gly He His Glu Ser Gly Leu Ala Ala 
180 185 190 



Ala Val Lys His Phe Pro Gly Asp Gly Val Asp Asp Arg Asp Gin His 
195 200 205 



Leu Leu Ser Ser Val Asn Thr Leu Ser Thr Glu Glu Trp Asp Gin Thr 
210 215 220 



Phe Gly Met Val Tyr Gin Glu Met He Asp Ser Gly Ala Lys Ser He 
225 230 235 240 



Met Ala Gly His He Met Leu Pro Glu Tyr Ser Arg Glu Leu Leu Pro 
245 250 255 

Gly He Glu Asp Glu Gin He Met Pro Ala Thr Leu Ala Pro Glu Leu 
260 265 270 

Leu Asn Gly Leu Leu Arg Glu Lys Leu Gly Phe Asn Gly Leu He Val 
275 280 285 



Thr Asp Ala Ser Pro Met Leu Gly Phe Thr Thr Ser Glu Arg Arg Glu 
290 295 300 



He Ala Val Pro Lys Ala He Ala Ser Gly Cys Asp Met Phe Leu Phe 
305 310 315 320 



Asn Arg Asn He Lys Glu Asp Tyr Glu Phe Met Leu Asn Gly He Glu 
.325 330 335 



Thr Gly He Leu Thr Leu Glu Arg Val Asp Glu Ala Val Thr Arg Val 
340 345 350 



Leu Ala Leu Lys Ala Ser Leu Gly Leu Asn Val Gin Lys Glu Leu Gly 
355 360 365 
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lie Leu Val Pro Glu Glu Ala Glu Leu Ser Val Leu Gin Ser Glu Glu 
370 375 380 



His Leu Asp Trp Ala Arg Glu Cys Ala Asp Gin Ser Val Thr Leu Val 
385 390 395 400 



Lys Asp Thr Gin Lys Leu Leu Pro He Ser Ala Asp Gin Tyr Lys Arg 
405 410 415 



Val Arg Leu Tyr Val Leu Gly Asp Gin Glu Gly Gly Leu Lys Glu Gly 
420 425 430 



Gly Ser Val Thr Gin Pro Phe He Asp Ser Leu Lys Asn Ala Gly Phe 
435 440 445 



Glu Val Asp Leu Tyr Asn Asp Lys Gin Val Asn Phe Gin Glu Leu Phe 
450 455 460 



Met Ser Val Asn Glu Phe Lys Lys Asn Tyr Asp Leu He He Tyr Val 
465 470 475 480 



Ala Asn Leu Glu Thr Ala Ser Asn Gin Thr Thr Val Arg He Asn Trp 
485 490 495 



Gin Gin Pro Leu Asn Ala Asn Ala Pro Trp Phe Val Lys Asp He Pro 
500 505 510 



Thr Leu Phe He Ser Val Ala Asn Pro Tyr His Leu Gin Asp Val Pro 
515 520 525 



Met Val Lys Thr Tyr He Asn Ala Tyr Ser Ser Asn Glu Tyr Val Val 
530 535 540 



Glu Ala He Val Asp Lys He Leu Gly Lys Ser Glu Phe Lys Gly Lys 
545 550 555 560 



Asn Pro Val Asp Pro Phe Cys Gly Lys Trp Asp Thr Arg Leu 
565 570 



<210> 93 
<211> 546 
<212> DNA 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
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<400> 93 



atgagaataa 


aaaatttaaa 


aacgaaccgt 


atcacaaacc 


cgctgggatt tgatatagga 


60 


aaaccacgta 


tatcttttgt 


cacttatgac 


actacggcta 


aaaagcaaac agcagcgcaa 


120 


atacaggttg 


cgctagatca 


agagtttacg 


aacctaacat 


ttgacagtgg gaaaagcacg 


180 


gagatagata 


gtctagcata 


cgaactgcca 


tttcaattag 


agtcttacac tcgctactac 


240 


tggcgtgtga 


ccgtttgggc 


ggataatggg 


gatgtggcca 


caagtgaaat tgcttggttt 




gaaacagcca 


aactaggcga 


ttcttgggag 


gccaagtgga 


ttacccccga ttttgataag 


360 


gaaatccatc 


ccgtactatc 


aagggaattt 


gatttgtcaa aagaagtcgt ttctgcccgt 


420 


gcctatgttt 


gcggtttggg 


attatatgaa 


atggagatta 


atggtctaaa ggctggggat 


480 


gaatatctga 


cccctaattt 


caacgcctat 


gataaatggc 


tgcagtacca gacctatgat 


540 


attaca 










546 



<210> 94 
<211> 182 
<212> PRT 
<213> Unknovm 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> SITE 

<222> (51) . . , (54) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 94 

Met Arg lie Lys Asn Leu Lys Thr Asn Arg lie Thr Asn Pro Leu Gly 
15 10 . 15 



Phe Asp He Gly Lys Pro 2^g He Ser Phe Val Thr Tyr Asp Thr Thr 
20 25 30 



Ala Lys Lys Gin Thr Ala Ala Gin He Gin Val Ala Leu Asp Gin Glu 
35 40 45 



Phe Thr Asn Leu Thr Phe Asp Ser Gly Lys Ser Thr Glu He Asp Ser 
50 55 60 



Leu Ala Tyr Glu Leu Pro Phe Gin Leu Glu Ser Tyr Thr Arg Tyr Tyr 
65 70 75 80 



Trp Arg Val Thr Val Trp Ala Asp Asn Gly Asp Val Ala Thr Ser Glu 
85 90 95 



He Ala Trp Phe Glu Thr Ala Lys Leu Gly Asp Ser Trp Glu Ala Lys 
100 105 110 
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Trp He Thr Pro Asp Phe Asp Lys Glu He His Pro Val Leu Ser Arg 
115 120 125 

Glu Phe Asp Leu Ser Lys Glu Val Val Ser Ala Arg Ala Tyr Val Cys 
130 135 140 

Gly Leu Gly Leu Tyr Glu Met Glu He Asn Gly Leu Lys Ala Gly Asp 
145 150 155 160 

Glu Tyr Leu Thr Pro Asn Phe Asn Ala Tyr Asp Lys Trp Leu Gin Tyr 
165 170 175 

Gin Thr Tyr Asp He Thr 
180 

<210> 95 
<211> 2298 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 95 



atgatcaatc 


aagatataaa 


acaattaatc tcacaaatga ccttggaaga aaaagctggt 


60 


ctttgttctg 


gattagattt 


ttggaattta aaaggtatcg aaagactggg aataccctcg 


120 


ataatggtaa 


ccgatggtcc 


gcatggactc cgtaaacaaa aaatgggagc agatcattta' 


180 


gggctgtttg 


acagtattcc 


tgcgacatgt ttcccatctg cagccggttt agctagtact 


240 


tggaataaag 


agttaatata 


tgaagttggg gttgcattag gaaaggaatg ccaggcagag 


300 


gatgtggcaa 


tacttcttgg 


ccctggagca aacattaagc gctcacccct ttgtggcaga 


360 


aactttgaat 


atttttcgga 


agatccattc ctttcatcag aaatggctgc gtcccatatc 


420 


aagggtgttc 


aaagtgaggg 


ggttgggaca tcacttaagc acttcgctgc aaataatcaa 


480 


gaacaccgaa 


gaatgtcgac 


agatgctatt gtggatgaaa ggacgttgcg agaaatatat 


540 


ttggccagct 


ttgaaaacgc 


tgtaaagaaa gcgcagccat ggactgtgat gtgcgcctac 


600 


aacaaggtca 


atggagactt 


tgcatcagaa aataaaacat tgttaactga catcctgcga 


660 


gatgagtggg 


gctttgaagg 


aattgttgtt tctgactggg gggcggttaa tgaacctgtt 


720 


gacggattaa 


atgccgggtt 


agacctggaa atgccttcaa gtagtgggat tggtgaaaag 


780 


aaaatcatca 


atgctgtaag 


aaatggtcag cttttagaag ataaactaga tcaggcagtt 


840 


gaaagaattc 


tacgtattat 


cttaatggca gtagaaaaca agaaagaaac cgctgactat 


900 
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gataaagaac 


aacatcataa 


gcbtgcaaga 


aaagcagcaa gtgaaagtat ggttttatta 


960 


aagaatgaag 


ataatatcct 


gccgttaaag 


aaagaaggaa ccatttcgat tattggttca 


1020 


tttgccaaaa 


aaccaaggta 


tcaaggcggt 


ggaagctcac acattaaccc gacaaagctt 


1080 


gaaaatatct 


atgaagaaat 


agagaaaaca 


gcgggccaaa atgtgaacgt tttatacgcg 


1140 


gaaggatatc 


atcttgaaaa 


ggatttaatc 


gatgatcaat taattgaaga ggcaaaaaaa 


1200 


acggcagcaa 


aatccgatgt 


aaccgtattg 


tttgtaggtc ttcctgaccg atatgaatct 


1260 


gaaggatatg 


atagagagca 


cctgaatata 


ccggagaatc accgtctttt agtcgaagcg 


1320 


gttgcggaag 


tacaaaagaa 


tatagttgtt 


gtactaagta atggggcacc gcttgttatg 


1380 


ccatggcttg 


ataaggtgaa 


ggggctgctg 


gaaagttacc tgggaggtca ggcactagga 


1440 


gsrtgcgattg 


cagacatcct 


attcggagaa 


gttaatccaa gtggaaagct tgccgaaact 


1500 


tttcccgtaa 


aattaggtga 


caatccttct 


tatctcaact ttccaggaga gagggataaa 


1560 


gttgagtata 


aagaaggcat 


ctttgttggt 


tatcgttatt acgatacaaa acagattgag 


1620 


ccgctgtttc 


catttggata 


tggtttaagc 


tatacaaact ttgaatataa aaaccttgta 


1680 


attgataaaa 


aagaaataaa 


agatacagaa 


attgtcacag ttaccgtgaa tgtgaaaaat 


1740 


acaggaaaag 


tgcctgggaa 


agaaatcatc 


cagttatatg taaaagatat aaaaagcagt 


1800 


gtagttcgtc 


ctgaaaaaga 


gttaaaaggc 


tttggaaagg tttccttaca gcctggggaa 


1860 


gacaaaacta 


tttcctttaa 


attggataaa 


cgcgcatttg catattacaa cacggaattg 


1920 


aaggattggt 


atgtagaatc 


aggagaattt 


gaaattttgg tggggaaatc gtccagagaa 


1980 


attgaactaa 


cagaaaaaat 


tatggttcac 


tctacttccc cagttttctt ggaggttcac 


2040 


cgaaattcca 


cggtcggaga 


tcttttaact 


gatccaattc taggtgaaaa agctaatgct 


2100 


ctaattagag 


agctaacaaa 


aggaagtcca 


ttatttgatg ctgggtcaga tcacggagag 


2160 


ggtgcagaaa 


tgatggaagc 


gatgttaaaa 


tacatgcctt tgcgtgctct tatgaatttt 


2220 


agtggtggag 


acattaccga 


agagaaacta 


actgaattta ttaaggaact taattcaact 


2280 


aattttgtaa 


gcctttaa 






2298 



<210> 96 
<211> 765 
<212> PRT 
<213> Unknovm 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (30) . . . (252) 

<223> Glycosyl hydrolase family 3 N terminal domain 
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<220> 

<221> DOMAIN 

<222> (317) . . . (531) 

<223> Glycosyl hydrolase family 3 C terminal domain 
<220> 

<221> SITE 

<222> (214) . . . (217) 

<223> N-glycosylation site, Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (221) . . . (238) 

<223> Glycosyl hydrolases family 3 active site. Prosite id = PS00775 
<220> 

<221> SITE 

<222> (692) . . . (695) 

<223> N-glycosylation site, Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (750) . . . (753) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (769) . . . (772) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 96 

Met He Asn Gin Asp He Lys Gin Leu lie Ser Gin Met Thr Leu Glu 
15 10 15 



Glu Lys Ala Gly Leu Cys Ser Gly Leu Asp Phe Trp Asn Leu Lys Gly 
20 25 30 



He Glu Arg Leu Gly He Pro Ser He Met Val Thr Asp Gly Pro His 
35 40 45 



Gly Leu Arg Lys Gin Lys Met Gly Ala Asp His Leu Gly Leu Phe Asp 
50 55 60 



Ser He Pro Ala Thr Cys Phe Pro Ser Ala Ala Gly Leu Ala Ser Thr 
65 70 75 80 



Trp Asn Lys Glu Leu He Tyr Glu Val Gly Val Ala Leu Gly Lys Glu 
85 90 95 



Cys Gin Ala Glu Asp Val Ala He Leu Leu Gly Pro Gly Ala Asn He 
100 105 110 



Lys Arg Ser Pro Leu Cys Gly Arg Asn Phe Glu Tyr Phe Ser Glu Asp 
115 120 125 
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Pro Phe Leu Ser Ser Glu Met Ala Ala Ser His lie Lys Gly Val Gin 
130 135 140 



Ser Glu Gly Val Gly Thr Ser Leu Lys His Phe Ala Ala Asn Asn Gin 
145 150 155 160 



Glu His Arg Arg Met Ser Thr Asp Ala He Val Asp Glu Arg Thr Leu 
165 170 175 



Arg Glu He Tyr Leu Ala Ser Phe Glu Asn Ala Val Lys Lys Ala Gin 
180 185 190 



Pro Trp Thr Val Met Cys Ala Tyr Asn Lys Val Asn Gly Asp Phe Ala 
195 200 205 



Ser Glu Asn Lys Thr Leu Leu Thr Asp He Leu Arg Asp Glu Trp Gly 
210 215 220 



Phe Glu Gly He Val Val Ser Asp Trp Gly Ala Val Asn Glu Pro Val 
225 230 235 240 



Asp Gly Leu Asn Ala Gly Leu Asp Leu Glu Met Pro Ser Ser Ser Gly 
245 250 255 



He Gly Glu Lys Lys He He Asn Ala Val Arg Asn Gly Gin Leu Leu 
260 265 270 



Glu Asp Lys Leu Asp Gin Ala Val Glu Arg He Leu Arg He He Leu 
275 280 285 



Met Ala Val Glu Asn Lys Lys Glu Thr Ala Asp Tyr Asp Lys Glu Gin 
290 295 300 



His His Lys Leu Ala Arg Lys Ala Ala Ser Glu Ser Met Val Leu Leu 
305 310 315 ' 320 



Lys Asn Glu Asp Asn He Leu Pro Leu Lys Lys Glu Gly Thr He Ser 
325 330 335 



He He Gly Ser Phe Ala Lys Lys Pro Arg Tyr Gin Gly Gly Gly Ser 
340 345 350 



Ser His lie Asn Pro Thr Lys Leu Glu Asn He Tyr Glu Glu He Glu 
355 360 365 
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Lys Thr Ala Gly Gin hsn Val Asm Val Leu Tyr Ala Glu Gly Tyr His 
370 375 380 



Leu Glu Lys Asp Leu lie Asp Asp Gin Leu He Glu Glu Ala Lys Lys 
385 390 395 400 



Thr Ala Ala Lys Ser Asp Val Thr Val Leu Phe Val Gly Leu Pro Asp 
405 410 415 



Arg Tyr Glu Ser Glu Gly Tyr Asp Arg Glu His Leu Asn He Pro Glu 
420 425 430 



Asn His Arg Leu Leu Val Glu Ala Val Ala Glu Val Gin Lys Asn He 
435 440 445 



Val Val Val Leu Ser Asn Gly Ala Pro Leu Val Met Pro Trp Leu Asp 
450 455 460 



Lys Val Lys Gly Leu Leu Glu Ser Tyr Leu Gly Gly Gin Ala Leu Gly 
465 470 475 480 



Gly Ala He Ala Asp He Leu Phe Gly Glu Val Asn Pro Ser Gly Lys 
485 490 495 



Leu Ala Glu Thr Phe Pro Val Lys Leu Gly Asp Asn Pro Ser Tyr Leu 
500 505 510 



Asn Phe Pro Gly Glu Arg Asp Lys Val Glu Tyr Lys Glu Gly He Phe 
515 520 525 



Val Gly Tyr Arg Tyr Tyr Asp Thr Lys .Gin He Glu Pro Leu Phe Pro 
530 535 540 



Phe Gly Tyr Gly Leu Ser Tyr Thr Asn Phe Glu Tyr Lys Asn Leu Val 
545 550 555 560 



He Asp Lys Lys Glu He Lys Asp Thr Glu He Val Thr Val Thr Val 
565 570 575 



Asn Val Lys Asn Thr Gly Lys Val Pro Gly Lys Glu He He Gin Leu 
580 585 590 



Tyr Val Lys Asp He Lys Ser Ser Val Val Arg Pro Glu Lys Glu Leu 
595 600 605 
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Lys Gly Phe Gly Lys Val Ser Leu Gin Pro Gly Glu Asp Lys Thr He 
610 615 620 



Ser Phe Lys Leu Asp Lys Arg Ala Phe Ala Tyr Tyr Asn Thr Glu Leu 
625 630 635 640 



Lys Asp Trp Tyr Val Glu Ser Gly Glu Phe Glu He Leu Val Gly Lys 
645 650 655 



Ser Ser Arg Glu He Glu Leu Thr Glu Lys He Met Val His Ser Thr 
660 665 670 



Ser Pro Val Phe Leu Glu Val His Arg Asn Ser Thr Val Gly Asp Leu 
675 680 685 



Leu Thr Asp Pro He Leu Gly Glu Lys Ala Asn Ala Leu He Arg Glu 
690 695 700 



Leu Thr Lys Gly Ser Pro Leu Phe Asp Ala Gly Ser Asp His Gly Glu 
705 710 715 720 



Gly Ala Glu Met Met Glu Ala Met Leu Lys Tyr Met Pro Leu Arg Ala 
725 730 735 



Leu Met Asn Phe Ser Gly Gly Asp He Thr Glu Glu Lys Leu Thr Glu 
740 745 750 



Phe He Lys Glu Leu Asn Ser Thr Asn Phe Val Ser Leu 
755 760 765 



<210> 97 
<211> 615 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 97 



atgttatacc 


caattataac 


tgaaactcgc agtatcatcg atttaaatgg 


tatctggaaa 


60 


tttaaattag 


ataatggtga 


aggactgcag gaaaaatggt atgaaaacgg 


attaacagac 


120 


acgatcagta 


tggctgtacc 


atcttccttt aatgatattg gagtaaatgc 


cagtatacgc 


180 


gatcatgttg 


gctgggtatg 


gtatgagcgg gaattttctg tccccgccat 


ccttcaatct 


240 


gagcgtgtgg 


ttttgcgatt 


cggttccgca acacatctag ctaaggtttt 


cgtaaatggt 


300 


gaacttgttg 


ttgaacataa 


crggcggtttt ttaccgtttg aagcagaaat 


aaataagttt 


360 
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ttacaaaaag 


ggaaaaatcg 


aataacggtt gctgtcaaca atattcttga 


ttactcaact 


420 


ttacccgttg 


gcacagtaat 


agaaaaggat attcctggag ttggcaaagt 


aatacgcaat 


480 


cagccaaatt 


ttgacttctt 


caactacgct ggcttgcacc gtccagtgaa 


aatatatact 


540 


acaccgacta 


cttatgtgaa 


ggatgtaacc attgtaacgg aaatagatgg 


acaggttcac 


600 


tattcaattg 


attaa 






615 


<210> 98 
<211> 204 
<212> PRT 











<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (9) . . , (182) 

<223> Glycosyl hydrolases family 2, sugar binding domain 
<220> 

<221> SITE 

<222> (56) . . . (59) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 98 

Met Leu Tyr Pro He He Thr Glu Thr Arg Ser He He Asp Leu Asn 
1 5 - 10 15 



Gly He Trp Lys Phe Lys Leu Asp Asn Gly Glu Gly Leu Gin Glu Lys 
20 26 , 30 



Trp Tyr Glu Asn Gly Leu Thr Asp Thr He Ser Met Ala Val Pro Ser 
35 40 45 



Ser Phe Asn Asp He Gly Val Asn Ala Ser He Arg Asp His Val Gly 
50 55 60 



Trp Val Trp Tyr Glu Arg Glu Phe Ser Val Pro Ala He Leu Gin Ser 
65. 70 75 80 



Glu Arg Val Val Leu Arg Phe Gly Ser Ala Thr His Leu Ala Lys Val 
85 90 95 



Phe Val Asn Gly Glu Leu Val Val Glu His Lys Gly Gly Phe Leu Pro 
100 105 110 



Phe Glu 



Ala 
115 



Glu 



He Asn Lys Phe Leu Gin Lys Gly Lys Asn Arg He 
120 125 
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Thr Val Ala Val Asn Asn He Leu Asp Tyr Ser Thr Leu Pro Val Gly 

130 135 140 

Thr Val He Glu Lys Asp He Pro Gly Val Gly Lys Val He Arg Asn 
145 150 155 160 

Gin Pro Asn Phe Asp Phe Phe Asn Tyr Ala Gly Leu His Arg Pro Val 
165 170 175 

Lys He Tyr Thr Thr Pro Thr Thr Tyr Val Lys Asp Val Thr He Val 
180 185 190 

Thr Glu He Asp Gly Gin Val His Tyr Ser He Asp 



<210> 99 
<211> 1404 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 99 

atgaatcatt ccctttcatt tccgccatcc tttgtatggg gcgcggcaac cgcaagctac 60 

caactggaag gatcaaccca aggcgtggac ggctgcgccg agtccgtctg ggatatgcac 120 

tgccgaagat ccggcgcgat caaggacggc tcgaacggat tcgtcgcctg cgatcactac 180 

catcgctatc gcgaggatgt ggcgctcatg aacgagcttg gcttgaatgc ctatcgattc 240 

tcaatcatgt ggccccgcgt catgcccgaa ggcaccggcg cggtgaacga gaagggcatg 300 

gatttctacg atcggttggt tgatgaactg ctcgccgccg gcatcacacc ttgggttact 360 

ttgttccact gggactttcc cctagccttg ttccaacgcg gtggctggct gaatgcggat 420 

tccccgcaat ggtttgagga ttacactcgg gaagtggtta aacgcttgtc ggatcgtgtg 480 

catcactggc taacgctcaa cgaaccggcg tgcttcattg agtttggcca ccgtaccggc 540 

atgcatgcac ccggcttgca actggcggac aaggaagcct gccgggtctg gcaccatgcc 600 

atgctggccc acggtcgcgc cgttcgcgct attcgccagg aatccgtgca tccatcaccc 660 

caggtcggct acgcgccggt cttccgcact accatcccgg acactgaaga tcctgccgac 720 

atcgaagcgg cccggacctc gatgtttgct catcaggccg gcaacctgtt cgatacgcgg 780 

tggaacctcg acccctgctt tcggggcgcg tatccggaga tcatgatgca gtattggggc 840 

gatgccgcgc cgcgcatcca ggacggcgac atggagttga tccgtcagga actcgatttt 900 

ctcggcctga atatttacca gtccgagcgc attcgggccg gtgcggatgg cgcacccgag 960 
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gtggtgccat 


accctgcgga 


ttatccgcgc 


aaccagctcg 


gttggcccat 


cacgccggag 


1020 


gccctgcgct 


gggcgaccct 


ctttctcttt 


gaggagtacg 


ggaaacccct 


gatcatcaca 


1080 


gaaaaoggaa 


tcaccctcga 


cgacaagccc 


aatgcagacg 


gcgaggtgaa 


tgatgtccag 


1140 


cggatcgctt 


ttctgaatga 


ctatcttagc 


ggtctccagc 


gcagcgtgga 


cgacggcatc 


1200 


cctgtactgg 


gctatttcca 


ctggtcgctg 


tgcgacaact 


ttgagtgggc 


agaaggctat 


1260 


gtccctcgct 


tcggcctgat 


ccatgtggac 


tatgccagtc 


aacgcagaac 


catcaaggcc 


1320 


tcaggacggt 


tttaccgcga 


catcattcgg 


ggccagacag 


ccacgccctg 


catcgcccaa 


1380 


tccagtcagc 


cggaaacaac 


ctaa 








1404 



<210> 100 
<211> 467 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (3) . . . (454) 

<223> Glycosyl hydrolase family 1 
<220> 

<221> SITE 
<222> (2) . . . (5) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (11) . . . (25) 

<223> Glycosyl hydrolases family 1 N- terminal signature. Prosite" id = 
PS00653 

<400> 100 

Met Asn His Ser Leu Ser Phe Pro Pro Ser Phe Val Trp Gly Ala Ala 
15 10 15 



Thr Ala Ser Tyr Gin Leu Glu Gly Ser Thr Gin Gly Val Asp Gly Cys 
20 25 30 



Ala Glu Ser Val Trp Asp Met His Cys Arg Arg Ser Gly Ala lie Lys 
35 40 45 



Asp Gly Ser Asn Gly Phe Val Ala Cys Asp His Tyr His Arg Tyr Arg 
50 55 60 



Glu Asp Val 
65 



Ala Leu Met 
70 



Asn Glu Leu Gly Leu Asn Ala Tyr Arg Phe 
75 80 
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Ser He Met Trp Pro Arg Val Met Pro Glu Gly Thr Gly Ala Val Asn 
85 90 95 



Glu Lys Gly Met Asp Phe Tyr Asp Arg Leu Val Asp Glu Leu Leu Ala 
100 105 110 



Ala Gly He Thr Pro Trp Val Thr Leu Phe His Trp Asp Phe Pro Leu 
115 120 125 



Ala Leu Phe Gin Arg Gly Gly Trp Leu Asn Ala Asp Ser Pro Gin Trp 
130 135 140 



Phe Glu Asp Tyr Thr Arg Glu Val Val Lys Arg Leu Ser Asp Arg Val 
145 150 155 160 



His His Trp Leu Thr Leu Asn Glu Pro Ala Cys Phe He Glu Phe Gly 
165 170 175 



His Arg Thr Gly Met His Ala Pro Gly Leu Gin Leu Ala Asp Lys Glu 
180 185 190 



Ala Cys Arg Val Trp His His Ala Met Leu Ala His Gly Arg Ala Val 
195 200 205 



Arg Ala He Arg Gin Glu Ser Val His Pro Ser Pro Gin Val Gly Tyr 
210 215 220 



Ala Pro Val Phe Arg Thr Thr He Pro Asp Thr Glu Asp Pro Ala Asp 
225 230 * 235 240 



He Glu Ala Ala Arg Thr Ser Met Phe Ala His Gin Ala Gly Asn Leu 
245 250 255 



Phe Asp Thr Arg Trp Asn Leu Asp Pro Cys Phe Arg Gly Ala Tyr Pro . 
260 265 270 



Glu He Met Met Gin Tyr Trp Gly Asp Ala Ala Pro Arg He Gin Asp 
275 280 285 



Gly Asp Met Glu Leu He Arg Gin Glu Leu Asp Phe Leu Gly Leu Asn 
290 295 300 



He Tyr Gin Ser Glu Arg He Arg Ala Gly Ala Asp Gly Ala Pro Glu 
305 310 315 320 



Val Val Pro Tyr Pro Ala Asp Tyr Pro Arg Asn Gin Leu Gly Trp Pro 



wo 2006/101584 PCT/US2006/002516 

156/248 

325 330 335 

He Thr Pro Glu Ala Leu Arg Trp Ala Thr Leu Phe Leu Phe Glu Glu 
340 345 350 

Tyr Gly Lys Pro Leu He He Thr Glu Asn Gly He Thr Leu Asp Asp 
355 360 365 

Lys Pro Asn Ala Asp Gly Glu Val Asn Asp Val Gin Arg He Ala Phe 
370 375 380 

Leu Asn Asp Tyr Leu Ser Gly Leu Gin Arg Ser Val Asp Asp Gly He 
385 390 395 400 

Pro Val Leu Gly Tyr Phe His Trp Ser Leu Cys Asp Asn Phe Glu Trp 
405 410 415 

Ala Glu Gly Tyr Val Pro Arg Phe Gly Leu He His Val Asp Tyr Ala 
420 425 430 

Ser Gin Arg Arg Thr He Lys Ala Ser Gly Arg Phe Tyr Arg Asp He 
435 440 445 

He Arg Gly Gin Thr Ala Thr Pro Cys He Ala Gin Ser Ser Gin Pro 
450 455 460 

Glu Thr Thr 
465 



<210> 101 
<211> 1101 
<212> DNA 
<213> Unknown 










<220> 

<223> Obtained from environmental sample 






<400> 101 
atgagaaatc 


atctgaatgt 


acccttttac 


tttatcttct tttttttaat 


agcgtcaata 


60 


tttacagtct 


gttcatcatc 


aactgcttct 


gataacaatg agcatccacc 


gccagtggaa 


120 


gtcgcggatc 


aggacgcttt 


tcgtgatgct 


tttgaagtga atgaattact 


tggacgcggt 


180 


attaatctgg 


gtaatgccct 


tgaagcgccc 


aatgaaggcg aatggggaat 


ggtaatccag 


240 


gaagagtttc 


ttgatctgat 


acttgcagca 


ggttttgagt ctgtacgaat 


tccgattcgc 


300 


tggaatgccc 


atgccagtga 


aagtcaccct 


ttcaccattc aacgatcgtt 


ttttgatcgg 


360 


gttgatgaag 


tcatccaatg 


gtcgctggat 


cgtggccttt ctgtaatgat 


caatattcat 


420 
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cactacaatg 


aactgatgca 


aaacccgcag 


cagcaccggc 


agcggttttt 


gcgactctgg 


480 


aaccagattg 


ctacacacta 


taaagattat 


ccggataatc 


tggtttttga 


aatccttaat 


540 


gaacctcatg 


ataatctgac 


tccttctatc 


tggaatagtt 


atttgaggga 


tgctattggc 


600 


atgattcgcc 


agacaaaccc 


acgcagggtt 


atcgctatcg 


gaacagcaaa 


ctggggtggt 


660 


ttcggagcat 


tatcacaact 


tgaaatcccc 


tcaaacgatc 


gccagatcat 


tgcaactgtt 


720 


cattattatg 


aacccttcag 


gttcacccat 


cagggggctg 


aatgggcagg 


accggaaaca 


780 


aacgattggc 


tggggacacg 


atgggatgga 


tcggatgagg 


aaaaatttga 


tattgaaagt 


840 


ggttttgatg 


ccgtacagtc 


ctgggcagtg 


acaaataacc 


ggcctgttca 


tctcggagaa 


900 


ttcggtgctt 


acagtactgc 


cgataatgaa 


tcacgcgaac 


gctggacaac 


ctttgttcgg 


960 


gaatccgctg 


agcaacgcaa 


tttcagctgg 


gcatactggg aatttgcagc 


cggttttggg 


1020 


atctatgacc 


gtaatcagtg 


gcaatggagg 


gattatctgt 


tgagggcttt 


gataccggat 


1080 


agcccggtcc 


tgttggagta 


a 








1101 



<210> 102 
<211> 366 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAL 
<222> (1) . . . (29) 

<220> 

<221> DOMAIN 
<222> (64) . . . (342) 

<223> Cellulase (glycosyl hydrolase family 5) 
<22*0> 

<221> SITE 

<222> (176) . . . (185) 

<223> Glycosyl hydrolases family 5 signature. Prosite id = PS00659 
<220> 

<221> SITE 

<222> (313) . , . (316) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (332) . . . (335) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 102 

Met Arg Asn His Leu Asn Val Pro Phe Tyr Phe lie Phe Phe Phe Leu 
15 10 15 
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He Ala Ser He Phe Thr Val Cys Ser Ser Ser Thr Ala Ser Asp Asn 
20 25 30 



Asn Glu His Pro Pro Pro Val Glu Val Ala Asp Gin Asp Ala Phe Arg 
35 . 40 45 



Asp Ala Phe Glu Val Asn Glu Leu Leu Gly Arg Gly He Asn Leu Gly 
50 55 60 



Asn Ala Leu Glu Ala Pro Asn Glu Gly Glu Trp Gly Met Val He Gin 
65 70 75 80 



Glu Glu Phe Leu Asp Leu He Leu Ala Ala Gly Phe Glu Ser Val Arg 
85 90 95 



He Pro He Arg Trp Asn Ala His Ala Ser Glu Ser His Pro Phe Thr 
100 105 110 



He Gin Arg Ser Phe Phe Asp Arg Val Asp Glu Val He Gin Trp Ser 
115 120 125 



Leu Asp Arg Gly Leu Ser Val Met He Asn He His His Tyr Asn Glu 
130 135 140 



Leu Met Gin Asn Pro Gin Gin His Arg Gin Arg Phe Leu Arg Leu Trp 
145 150 155 160 



Asn Gin He Ala Thr His Tyr Lys Asp Tyr Pro Asp Asn Leu Val Phe 
165 170 175 



Glu He Leu Asn Glu Pro His Asp Asn Leu Thr Pro Ser He Trp Asn 
180 185 190 



Ser Tyr Leu Arg Asp Ala He Gly Met He Arg Gin Thr Asn Pro Arg 
195 200 205 



Arg Val He Ala He Gly Thr Ala Asn Trp Gly Gly Phe Gly Ala Leu 
210 215 220 



Ser Gin Leu Glu He Pro Ser Asn Asp Arg Gin He He Ala Thr Val 
225 230 235 240 



His Tyr Tyr Glu Pro Phe Arg Phe Thr His Gin Gly Ala Glu Trp Ala 
245 250 255 
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Gly Pro Glu Thr Asn Asp Trp Leu Gly Thr Arg Trp Asp Gly Ser Asp 

260 265 270 

Glu Glu Lys Phe Asp lie Glu Ser Gly Phe Asp Ala Val Gin Ser Trp 

275 280 285 

Ala Val Thr Asn Asn Arg Pro Val His Leu Gly Glu Phe Gly Ala Tyr 
290 295 300 

Ser Thr Ala Asp Asn Glu Ser Arg Glu Arg Trp Thr Thr Phe Val Arg 

305 310 315 320 

Glu Ser Ala Glu Gin Arg Asn Phe Ser Trp Ala Tyr Trp Glu Phe Ala 

325 330 335 

Ala Gly Phe Gly He Tyr Asp Arg Asn Gin Trp Gin Trp Arg Asp Tyr 

340 345 350 

Leu Leu Arg Ala Leu He Pro Asp Ser Pro Val Leu Leu Glu 



<210> 103 
<211> 1101 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 103 

atgctgataa ttggaggcct tcttgtttta ctgggatttt cttcttgcgg gcggcaggca 60 

gaacctgctg ctgactcttt cagggggttt catgactttg acatcaggcg tggggtgaac 120 

atcagccact ggttgtcgca gagtggaagg cgtggtgctg atcgggaggc gttctttacc 180 

agggcggatg tggaggccat cgccggcttc ggttatgatc acattcgttt gcccattgat 240 

gaggagcaga tgtgggatga gtcgggcaac aaggaaccac gtgcctttga attgctgcat 300 

gaagccattg gctgggcttt ggacaatgag ctcagggtca ttgtcgacct gcacatcatc 360 

aggtcgcact attttaatgc gcctgagaac ccgctttgga ccgatcgtgc tgaacagttg 420 

aaatttgttg agatgtggcg acagttgtct gatgagctgc agggctatcc gctcgatagg 480 

gtggcctatg aattgatgaa tgaggccgtg gctgatgatc cggacgattg gaaccggctt 540 

gtggctgaga cgatggaggc gctacggatg ctggaaccgg agcgcaagat tgtcattggc 600 

tccaaccgct ggcagtctgt gcatacattt cctgacctgg tgatcccgga taatgacccg 660 

catatcatat tgagttttca cttctacgaa ccatttctgc tgacgcacca caaggcctcc 720 
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tggacacaca 


tccgtgatta 


caccggtccg 


gtgaactatc 


cgggtttgac 


tgtagacccg 


780 


acccacctgg 


aggggttgtc 


tgaagaactg 


gtgacccgga 


ttggccatca 


caatggggtg 


840 


tatacaaaag 


aaacgatgga 


ggagatgatc 


atgatcccac 


tgcaatatgc 


caaagaccgg 


900 


gggctccccc 


tttattgtgg 


agagtgggga 


tgtttcccga 


ccatgcccca 


ggagatgcgc 


960 


ctgcaatggt 


acgccgatgt 


gcgtgcgatc 


cfcggaaaagc 


atgagattgc 


ctgggcaaac 


1020 


tgggattaca 


agggtggttt 


cggtgtggtt 


gaccgcaacg 


gcgaacccca 


ccatgattta 


1080 


ttggaagtgc 


tcttaaaata 


a 








1101 



<2XQ> 104 
<211> 366 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAL 
<222> (1) . . . (20) 

<220> 

<221> DOMAIN 
<222> (42)... (349) 

<223> Cellulase (glycosyl hydrolase family 5) 
<220> 

<221> SITE 

<222> (40)... (43) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 104 

Met Leu lie He Gly Gly Leu Leu Val Leu Leu Gly Phe Ser Ser Cys 
15 10 15 

Gly Arg Gin Ala Glu Prp Ala Ala Asp Ser Phe Arg Gly Phe . His Asp 
20 25 30 



Phe Asp He Arg Arg Gly Val Asn He Ser His Trp Leu Ser Gin Ser 
35 40 45 



Gly Arg Arg Gly Ala Asp Arg Glu Ala Phe Phe Thr Arg Ala Asp Val 
50 55 60 



Glu Ala He Ala Gly Phe Gly 'Tyr Asp His He Arg Leu Pro He Asp 
65 70 75 80 



Glu Glu Gin Met Trp Asp Glu Ser Gly Asn Lys Glu Pro Arg Ala Phe 
85 90 ' 95 
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Glu Leu Leu His Glu Ala lie Gly Trp Ala Leu Asp Asn Glu Leu Arg 
100 105 110 



Val lie Val Asp Leu His lie He Arg Ser His Tyr Phe Asn Ala Pro 
115 120 125 



Glu Asn Pro Leu Trp Thr Asp Arg Ala Glu Gin Leu Lys Phe Val Glu 
130 135 140 



Met Trp Arg Gin Leu Ser Asp Glu Leu Gin Gly Tyr Pro Leu Asp Arg 
145 150 155 160 



Val Ala Tyr Glu Leu Met Asn Glu Ala Val Ala Asp Asp Pro Asp Asp 
165 170 175 



Trp Asn Arg Leu Val Ala Glu Thr Met Glu Ala Leu Arg Met Leu Glu 
180 185 190 



Pro Glu Arg Lys He Val He Gly Ser Asn Arg Trp Gin Ser Val His 
195 200 205 



Thr Phe Pro Asp Leu Val He Pro Asp Asn Asp Pro His He He Leu 
210 215 220 



Ser Phe His Phe Tyr Glu Pro Phe Leu Leu Thr His His Lys Ala Ser 
225 230 235 240 



Trp Thr His He Arg Asp Tyr Thr Gly Pro Val Asn Tyr Pro Gly Leu 
245 * 250 255 



Thr Val Asp Pro Thr His Leu Glu Gly Leu Ser Glu Glu Leu Val Thr 
260 265 270 



Arg He Gly His His Asn Gly Val Tyr Thr Lys Glu Thr Met Glu Glu 
275 280 285 



Met He Met He Pro Leu Gin Tyr Ala Lys Asp Arg Gly Leu Pro Leu 
290 295 300 



Tyr Cys Gly Glu Trp Gly Cys Phe Pro Thr Met Pro Gin Glu Met Arg 
305 310 315 320 



Leu Gin Trp Tyr Ala Asp Val Arg Ala He Leu Glu Lys His Glu He 
325 330 335 



Ala Trp Ala Asn Trp Asp Tyr Lys Gly Gly Phe Gly Val Val Asp Arg 
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340 345 350 

Asn Gly Glu Pro His His Asp Leu Leu Glu Val Leu Leu Lys 
355 360 365 

<210> 105 
<211> 1047 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 105 



atgcaacact 


tcatcaacgg 


cgtcaacctg 


ggaggctggc 


tctcccaata 


ccagaaatac 


60 


gaccatgagc 


acttccgcac 


cttcatcacc 


cggcgcgata 


tcgaacaaat 


cgcatcctgg 


120 


ggcttcgacc 


acatccgcct 


gccggtcgat 


tatccggttc 


tcgaatcgga 


cgacgcgccc 


180 


ggtatctatc 


atgaagatgg 


ctttgcctat 


cttgactctt 


gcctggaatg 


gtgccaggcc 


240 


gctgggctgg 


cagtcgtctt 


cgacctgcat 


catgcccccg 


gctacagttt 


cacgaacacg 


300 


ctcaagcctg 


aaaccctgca 


cctgaacgta 


ctcttcgagc 


aggaaatcgc 


ccaaaatcga 


360 


tttatcgccc 


tctgggaaac 


cattgttcgg 


cgctaccagg 


ccgccggctt 


gcctatcatc 


420 


tttgaactac 


tgaatgaaat 


ggtgctgcca 


gacagcggcc 


cctggaacgc 


cctggcccac 


480 


aaaaccgtcg 


ccgccctgcg 


acagatttcg 


cccgattgca 


aaatcatgat 


tggcggcaat 


540 


aactacaacg 


ccgcatccga 


actcaaaaac 


ataaccctgc 


acaacgaccc 


caacatccta 


600 


tacaccttcc 


atttctacga 


accggccctg 


ttcacccacc 


agaaagcccc 


ctgggtgcag 


660 


attgctgtcg 


aatacaacca 


ggaactcgaa 


taccctggct 


cgtacaccaa 


cctggccgcc 


720 


tttctccggc 


gcaatcccca 


ctatcaagaa 


tcctatggat 


ggcaggtcaa 


ccgccgtatc 


780 


gaccgcgacc 


tcctgctcga 


attcacccaa 


cccgccctgg actttgtcca 


gcagaccggg 


840 


cgcgacctgt 


actgcggtga 


^ttcggcgtc 


attgaatacg 


tcgagcctgc 


cagccgccaa 


900 


aactggcacg 


ccgacctgct 


ggacatcctg 


cgccagcaga 


agattggccg 


cgccgtctgg 


960 


acttataaac 


aaatggattt 


tggcctggtg 


gacgcggacg 


gcaaggtggt 


cgaccccaaa 


1020 


cttctcgaaa 


tcttgtgtca 


atcctga 








1047 



<210> 106 
<211> 348 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 
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<221> DOMAIN 
<222> {2) . . . (330) 

<223> Cellulase {glycosyl hydrolase family 5) 
<220> 

<221> SITE 

<222> (192) . , . (195) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 106 

Met Gin His Phe He Asn Gly Val Asn Leu Gly Gly Trp Leu Ser Gin 
15 10 15 



Tyr Gin Lys Tyr Asp His Glu His Phe Arg Thr Phe He Thr Arg Arg 
20 25 30 



Asp He Glu Gin He Ala Ser Trp Gly Phe Asp His He Arg Leu Pro 
35 40 45 



Val Asp Tyr Pro Val Leu Glu Ser Asp Asp Ala Pro Gly He Tyr His 
50 55 60 



Glu Asp Gly Phe Ala Tyr Leu Asp Ser Cys Leu Glu Trp Cys Gin Ala 
^5 70 75 80 



Ala Gly Leu Ala Val Val Phe Asp Leu His His Ala Pro Gly Tyr Ser 
85 90 95 



Phe Thr Asn Thr Leu Lys Pro Glu Thr Leu His Leu Asn Val Leu Phe 
100 105 110 



Glu Gin Glu He Ala Gin Asn Arg Phe He Ala Leu Trp Glu Thr He 
115 120 125 



Val Arg Arg Tyr Gin Ala Ala Gly Leu Pro He He Phe Glu Leu Leu 
130 135 140 



Asn Glu Met Val Leu Pro Asp Ser Gly Pro Trp Asn Ala Leu Ala His 
145 150 155 160 



Lys Thr Val Ala Ala Leu Arg Gin He Ser Pro Asp Cys Lys He Met 
165 170 175 



He Gly Gly Asn Asn Tyr Asn Ala Ala Ser Glu Leu Lys Asn He Thr 
180 185 190 



Leu His Asn Asp Pro Asn He Leu Tyr Thr Phe His Phe Tyr Glu Pro 
195 200 205 
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Ala Leu Phe Thr His Gin Lys Ala Pro Trp Val Gin lie Ala Val Glu 
210 215 220 



Tyr Asn Gin Glu Leu Glu Tyr Pro Gly Ser Tyr Thr Asn Leu Ala Ala 
225 230 235 240 



Phe Leu Arg Arg Asn Pro His Tyr Gin Glu Ser Tyr Gly Trp Gin Val 
245 250 255 



Asn Arg Arg lie Asp Arg Asp Leu Leu Leu Glu Phe Thr Gin Pro Ala 
260 265 270 



Leu Asp Phe Val Gin Gin Thr Gly Arg Asp Leu Tyr Cys Gly Glu Phe 
275 280 285 



Gly Val lie Glu Tyr Val Glu Pro Ala Ser Arg Gin Asn Trp His Ala 
290 295 300 



Asp Leu Leu Asp lie Leu Arg Gin Gin Lys lie Gly Arg Ala Val Trp 
305 310 315 320 



Thr Tyr Lys Gin Met Asp Phe Gly Leu Val Asp Ala Asp Gly Lys Val 
325 330 335 



Val Asp Pro Lys Leu Leu Glu lie Leu Cys Gin Ser 
340 345 



<210> 107 
<211> 1137 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 107 



atggaaaagc 


aaatctgttc 


aaatgttttc 


agtacgatgc 


tgataattgg aggccttctt 


60 


gttttactgg 


gattttcttc 


ttgcgggcgg 


caggcagaac 


ctgctgctga ctctttcagg 


120 


gggtttcacg 


actttgacat 


caggcgcggg 


gtgaacatca 


gccattggtt gtcgcagagt 


180 


ggaaggcgtg 


gtgctgatcg 


ggaggcgttc 


tttaccaggg 


cggatgtgga ggccatcgcc 


240 


ggcttcggtt 


atgatcacat 


tcgtttgccc 


atcgatgaag 


agcagatgtg ggatgagtcg 


300 


ggcaacaagg 


agccacgtgc 


ctttgaattg 


ctgcatgagg 


ccattggctg ggctttggac 


360 


aatgagctca 


gggtcattgt 


tgacctgcac 


atcatcaggt 


cgcactattt taatgcgcct 


420 


gagaacccgc 


tttggaccga 


tcgtgctgaa 


cagttgaaat 


ttgttgagat gtggcgacag 


480 
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ttgtctgatg 


agctgcaggg 


ctatccgctc 


gatagggtgg 


cctatgaatt 


gatgaatgag 


540 


gccgtggctg 


atgatccgga 


cgattggaac 


cggcttgtgg 


ctgagacgat 


ggaggcgcta 


600 


cggatgctgg 


aaccggagcg 


caagattgtc 


attggctcca 


accgctggca 


gtctgtgcat 


660 


acatttcctg 


acctggtgat 


cccggataat 


gacccgcata 


tcatattgag 


ttttcacttc 


720 


tacgaaccat 


ttctgctgac 


gcaccacaag 


gcctcctgga 


cacacatccg 


tgattacacc 


780 


ggtccggtga 


actatccggg 


tttgactgta 


gacccgaccc 


acctggaggg 


gttgtctgaa 


840 


gaactggtga 


cccggattgg 


ccatcacaat 


ggggtgtata 


caaaagaaac 


gatggaggag 


900 


atgatcatga 


tcccactgca 


atatgccaaa 


gaacgggggc 


tccccctgta 


ttgcggggag 


960 


tggggatgtt 


tcccgaccat 


gccccaggag 


atgcgcctgc 


aatggtacgc 


cgatgtgcgt 


1020 


gcgatcctgg 


aaaagcatga 


gattgcctgg 


gcaaactggg 


attacaaggg 


tggtttcggt 


1080 


gtggttgacc 


gcaacggcga 


accccaccat 


gatttattgg 


aagtcttact 


aaaataa 


1137 



<210> 108 
<211> 378 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAL 
<222> (1)...(32) 

<220> 

<221> DOMAIN 
<222> (54) . . . (361) 

<223> Cellulase (glycosyl hydrolase family 5) 
<220> 

<221> SITE 

<222> (52) . . . (55) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 108 

Met Glu Lys Gin lie Cys Ser Asn Val Phe Ser Thr Met Leu lie lie 
15 10 15 



Gly Gly Leu Leu Val Leu Leu Gly Phe Ser Ser Cys Gly Arg Gin Ala 
20 25 30 . 



Glu Pro Ala Ala Asp Ser Phe Arg Gly Phe His Asp Phe Asp lie Arg 
35 40 45 



Arg Gly Val Asn lie Ser His Trp Leu Ser Glii Ser Gly Arg Arg Gly 
50 55 60 
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Ala Asp Arg Glu Ala Phe Phe Thr Arg Ala Asp Val Glu Ala He Ala 
65 70 75 80 



Gly Phe Gly Tyr Asp His He Arg Leu Pro He Asp Glu Glu Gin Met 
85 90 95 



Trp Asp Glu Ser Gly Asn Lys Glu Pro Arg Ala Phe Glu Leu Leu His 
100 105 110 



Glu Ala He Gly Trp Ala Leu Asp Asn Glu Leu Arg Val He Val Asp 
115 120 125 



Leu His He He Arg Ser His Tyr Phe Asn Ala Pro Glu Asn Pro Leu 
130 135 140 



Trp Thr Asp Arg Ala Glu Gin Leu Lys Phe Val Glu Met Trp Arg Gin 
145 150 155 160 



Leu Ser Asp Glu Leu Gin Gly Tyr Pro Leu Asp Arg Val Ala Tyr Glu 
165 170 175 



Leu Met Asn Glu Ala Val Ala Asp Asp Pro Asp Asp Trp Asn Arg Leu 
180 185 190 



Val Ala Glu Thr Met Glu Ala Leu Arg Met Leu Glu Pro Glu Arg Lys 
195 200 205 



He Val He Gly Ser Asn Arg Trp Gin Ser Val His Thr Phe Pro Asp 
210 215 220 



Leu Val He Pro Asp Asn Asp Pro His He He Leu Ser Phe His Phe 
225 230 235 240 



Tyr Glu Pro Phe Leu Leu Thr His His Lys Ala Ser Trp Thr His He 
245 250 255 



Arg Asp Tyr Thr Gly Pro Val Asn Tyr Pro Gly Leu Thr Val Asp Pro 
260 265 270 



Thr His Leu Glu Gly Leu Ser Glu Glu Leu Val Thr Arg He Gly His 
275 280 285 



His Asn Gly Val Tyr Thr Lys Glu Thr Met Glu Glu Met He Met He 
290 295 300 
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Pro Leu Gin Tyr Ala Lys Glu Arg Gly Leu Pro Leu Tyr Cys Gly Glu 
305 310 315 320 

Trp Gly Cys Phe Pro Thr Met Pro Gin Glu Met Arg Leu Gin Trp Tyr 
325 330 335 

Ala Asp Val Arg Ala lie Leu Glu Lys His Glu lie Ala Trp Ala Asn 
340 345 350 

Trp Asp Tyr Lys Gly Gly Phe Gly Val Val Asp Arg Asn Gly Glu Pro 
355 360 365 

His His Asp Leu Leu Glu Val Leu Leu Lys 



<210> 109 
<211> 1248 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 109 

atgaagacac atagcttcaa cctcagatca cggatcacct tgttgaccgc ggcactgctt 60 

ttcatcgggg caacggccgg ggccgccacg acacctatca ccctcaaaga cgcctacaaa 120 

gaccatttcc ttatgggtgt agccatcaac cgcctgattg caatgggcga tacgaatgtc 180 

cgggccgaca acgccagccg gaccccggaa cagctcaagg gggacattgc cctggtcaag 240 

gcgcagttca acctgatcgt caatgagaac gatctgaaac cgattctcat tcacccgagg 300 

ccaggaccgg acgggtacga cttcgcccca gcggatgcct tcgtgaagtt cggcatggac 3 60 

aacaatatgt atatcgtggg ccacaccctc ctctggcaca gccaggtgcc caactggttc 420 

ttccaagggt ctgctccggc gactccggaa acgccacctg ctgccacgga cgcggcggtc 480 

gcaccccgcg gcggacgagg aggtcgcggc gggattaccg gccccctggc gacccgcgag 540 

gagttgatcg aacgcatgcg cgagcacatt cacaccgtcg tcggccgcta taagggaaag 600 

atcaaggtct gggacgtcgt caacgaagcc ctcgccgacg gcggcaccga gaccctgcga 660 
agcacgtact ggacccaaat catcgggccg gaattcatcg ccatggcctt tcgattcgcc • 720 

cacgaagccg atccggatgc gatccttcgt tacaacgatt atggcctgga gaaccctgcc 780 

aagcgtgaga aactcaagaa gctgatcgcg tcgctccagg agcagaacgt tccggttcat 840 

gccatcggca cgcaaaccca tatcagcgtc tccacgacgt tcgaaagaat ggatgagacc 900 

ttgagggacc tggcatccat cgggcttccc gtccacatca ccgaactgga tgtcaacgcc 960 
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gccgcggggg 


gccagagggg 


caccaatgcg 


gacattgccg 


gcactgccga 


gcgtacggcg 


1020 


ggcggcgtgg 


tcagtgaagc 


cgacaagcgg 


ctggccgacg 


cctacgcgaa 


' tctcttccgc 


1080 


gcgatcatga 


agcacaagga 


ctcggtgaag 


atggtcacgt 


tctggggcgt 


caatgacgcg 


1140 


gtttcgtggc 


tcgcacgcgg 


caccccgctg 


ctgttcgacg 


gcaacaatca 


gcccaagccg 


1200 


gctttcgatg 


cggtcattcg 


cgtcgccacg 


gaggcggcac 


agaactga 




1248 



<210> 110 
<211> 415 
<212> PRT 
<213> Unlcnown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAL 
<222> (1) . . . (28) 

<220> 

<221> DOMAIN 
<222> (34) . . . (409) 

<223> Glycosyl hydrolase family 10 
<220> 

<221> SITE 

<222> (312) . . . (322) 

<223> Glycosyl hydrolases family 10 active site. Prosite id = PS00591 
<400> 110 

Met Lys Thr His Ser Phe Asn Leu Arg Ser Arg lie Thr Leu Leu Thr 
15 10 15 

Ala Ala Leu Leu Phe lie Gly Ala Thr Ala Gly Ala Ala Thr Thr Pro 
20 25 30 



lie Thr Leu Lys Asp Ala Tyr Lys Asp His Phe Leu Met Gly Val Ala 
35 40 45 



He Asn Arg Leu He Ala Met Gly Asp Thr Asn Val Arg Ala Asp Asn 
50 . 55 60 



Ala Ser Arg Thr Pro Glu Gin Leu Lys Gly Asp He Ala Leu Val Lys 
65 70 75 80 



Ala Gin Phe Asn Leu He Val Asn Glu Asn Asp Leu Lys Pro He Leu 
85 90 95 



He His Pro Arg Pro Gly Pro Asp Gly Tyr Asp Phe Ala 
100 105 



Pro Ala Asp 
110 
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Ala Phe Val hys Phe Gly Met Asp Asn Asn Met Tyr lie Val Gly His 
115 120 125 



Thr Leu Leu Trp His Ser Gin Val Pro Asn Trp Phe Phe Gin Gly Ser 
130 135 140 



Ala Pro Ala Thr Pro Glu Thr Pro Pro Ala Ala Thr Asp Ala Ala Val 

150 155 160 

Ala Pro Arg Gly Gly Arg Gly Gly Arg Gly Gly lie Thr Gly Pro Leu 
165 170 175 

Ala Thr Arg Glu Glu Leu lie Glu Arg Met Arg Glu His lie His Thr 
180 185 190 



Val Val Gly Arg Tyr Lys Gly Lys He Lys Val Trp Asp Val Val Asn 
195 200 205 



Glu Ala Leu Ala Asp Gly Gly Thr Glu Thr Leu Arg Ser Thr Tyr Trp 
210 215 220 



Thr Gin He He Gly Pro Glu Phe He Ala Met Ala Phe Arg Phe Ala 
225 230 235 240 

His Glu Ala Asp Pro Asp Ala He Leu Arg Tyr Asn Asp Tyr Gly Leu 
245 250 255 

Glu Asn Pro Ala Lys Arg Glu Lys Leu Lys Lys Leu He Ala Ser Leu 
260 265 270 

Gin Glu Gin Asn Val Pro Val His Ala He Gly Thr Gin Thr His He 
275 280 285 



Ser Val Ser Thr Thr Phe Glu Arg Met Asp Glu Thr Leu Arg Asp Leu 
290 295 300 



Ala Ser He Gly Leu Pro Val His He Thr Glu Leu Asp Val Asn Ala 
305 310 315 320 

Ala Ala Gly Gly Gin Arg Gly Thr Asn Ala Asp He Ala Gly Thr Ala 
325 330 335 

Glu Arg Thr Ala Gly Gly Val Val Ser Glu Ala Asp Lys Arg Leu Ala 
340 345 350 



Asp Ala Tyr Ala Asn Leu Phe Arg Ala He Met Lys His Lys Asp Ser 



wo 2006/101584 
355 

Val Lys Met Val Thr Phe Trp Gly Val Asn Asp Ala Val Ser Trp Leu 
370 375 380 

Ala Arg Gly Thr Pro Leu Leu Phe Asp Gly Asn Asn Gin Pro Lys Pro 
385 390 395 400 

Ala Phe Asp Ala Val He Arg Val Ala Thr Glu Ala Ala Gin Asn 
405 410 415 

<210> 111 
<211> 1131 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 



<400> 111 
atgcgaagac 


tgatcaccat 


catccttgcg 


acggctgtcg caatcttatc 


gaccacatca 


60 


tgctccaaga 


ccgctgaacg 


agagggcttc 


ttgatcaagc gaggaaccaa 


cctcagccat 


120 


tggctctccc 


agagcaagga 


aaggggagag 


gctcgcaggc tccatatcca 


ggaggatgac 


180 


tttgctcgcc 


tcgacagcct 


cggtttcgac 


catgtgcgca tccctgtcga 


cgaggaacaa 


240 


ctctgggacg 


aggatggcaa 


caagctcaca 


gaagcatggg aactgctcga 


tttcgccctc 


3oq 


gacatggcgc 


gcaagtacaa 


cctgcgcgct 


atcgtggacc ttcacatcat 


ccgcgcccat 


360 


tacttcaacg 


ccgtcaacga 


aggcgcgtcg 


aatactctct tcaccagcga 


ggaggcgcag 


420 


cagggcctga 


tcaacctttg 


gtaccagctt 


tccgacttcc tcaaggaccg 


cagcgtcgac 


480 


tgggttgcct 


acgagttcat 


gaacgagcca 


gtcgctcctg agcatgagca 


atggaacgcc 


540 


ctcgtcgcaa 


aggtgcacaa 


ggcgcttcgt 


gagcgtgaac cggagcgtac 


cctcgtgatc 


600 


ggttctaacc 


tgtggcaggg 


tcaccagacc 


ttcaagtacc tccgcgtacc 


tgagaatgac 


660 


ccgaacatca 


tcctgagctt 


ccactactac 


aacccttcga tcctcaccca 


caacatggct 


720 


ccgtggactc 


cggtgggcaa 


atataccggt 


tccatcaatt atccgggcgt 


catcgtctct 


780 


gctgaggatt 


acgctgcgca 


gagccctgag 


gtgcaggccg aggtgaagca 


gtatacggag 


840 


atggtctgga 


accgcgacac 


gatctacagc 


cagatgaagg atgcgatcga 


ggtggctgcc 


900 


agctatggac 


tgcagctctt 


ctgcggcgaa 


tggggcgtgt atgaacctgt 


cgaccgtgag 


960 


cttgcatacg 


catggaccaa 


ggatatgctg 


tcggtgttcg acgagttcga 


catcgcatgg 


1020 


acgacctggt 


gttacgatgc 


cgacttcggc 


ttctgggacc aggcgaaaca 


tgatttcaag 


1080 


gacaagcctc 


ttgtcgatct 


cctgatgggt 


tccaagggtc ttgaacaata 


g 


1131 
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<210> 112 
<211> 376 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAL 
<222> (1) . . . (22) 

<220> 

<221> DOMAIN 
<222> (39) . . . (353) 

<223> Cellulase (glycosyl hydrolase family 5) 
<220> 

<221> SITE 

<222> (37) . . . (40) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 112 

Met Arg Arg Leu lie Thr lie lie Leu Ala Thr Ala Val Ala He Leu 
15 10 15 



Ser Thr Thr Ser Cys Ser Lys Thr Ala Glu Arg Glu Gly Phe Leu He 
20 25 30 



Lys Arg Gly Thr Asn Leu Ser His Trp Leu Ser Gin Ser Lys Glu Arg 
35 40 45 



Gly Glu Ala Arg Arg Leu His He Gin Glu Asp Asp Phe Ala Arg Leu 
50 55 60 



Asp Ser Leu Gly Phe Asp His Val Arg He Pro Val Asp Glu Glu Gin 
65 70 75 80 



Leu Trp Asp Glu Asp Gly Asn Lys Leu Thr Glu Ala Trp Glu Leu Leu 
85 90 95 



Asp Phe Ala Leu Asp Met Ala Arg Lys Tyr Asn Leu Arg Ala He Val 
100 105 110 



Asp Leu His He He Arg Ala His Tyr Phe Asn Ala Val Asn Glu Gly 
115 120 125 



Ala Ser Asn Thr Leu Phe Thr Ser Glu Glu Ala Gin Gin Gly Leu He 
130 135 140 



Asn Leu Trp Tyr Gin Leu Ser Asp Phe Leu Lys Asp Arg Ser Val Asp 
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145 150 155 160 



Trp Val Ala Tyr Glu Phe Met Asn Glu Pro Val Ala Pro Glu His Glu 
165 170 175 



Gin Trp Asn Ala Leu Val Ala Lys Val His Lys Ala Leu Arg Glu Arg 
180 185 190 



Glu Pro Glu Arg Thr Leu Val He Gly Ser Asn Leu Trp Gin Gly His 
195 200 205 



Gin Thr Phe Lys Tyr Leu Arg Val Pro Glu Asn Asp Pro Asn He He 
210 215 220 



Leu Ser Phe His Tyr Tyr Asn Pro Ser He Leu Thr His Asn Met Ala 
225 230 235 240 



Pro Trp Thr Pro Val Gly Lys Tyr Thr Gly Ser He Asn Tyr Pro Gly 
245 250 255 



Val He Val Ser Ala Glu Asp Tyr Ala Ala Gin Ser Pro Glu Val Gin 
260 265 270 



Ala Glu Val Lys Gin Tyr Thr Glu Met Val Trp Asn Arg Asp Thr He 
275 280 285 



Tyr Ser Gin Met Lys Asp Ala He Glu Val Ala Ala Ser Tyr Gly Leu 
290 295 300 



Gin Leu Phe Cys Gly Glu Trp Gly Val Tyr Glu Pro Val Asp Arg Glu 
305 310 315 320 



Leu Ala Tyr Ala Trp Thr Lys Asp Met Leu Ser Val Phe Asp Glu Phe 
325 330 335 



Asp He Ala Trp Thr Thr Trp Cys Tyr Asp Ala Asp Phe Gly Phe Trp 
340 345 350 



Asp Gin Ala Lys His Asp Phe Lys Asp Lys Pro Leu Val Asp Leu Leu 
355 360 365 



Met Gly Ser Lys Gly Leu Glu Gin 
370 375 



<210> 113 
<211> 1095 
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<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 113 



atgaaggtga 


cccgaacagc 


tgtcgcgggc 


attgtcgccg 


cagcggtcct 


catcacgatc 


60 


ggcacgtcga 


ccgcgtcggc 


tgaggatgaa 


ccaaccagcg 


agaacacgtc 


gacggatcag 


120 


ccgttgcgcg 


tcctggcagc 


caaagccggg atcgcgttcg gcacggccgt 


cgacatgaac 


180 


gcgtacaaca 


acgacgcgac 


ctaccgtgag 


ctcgtcggcc 


aggagttctc 


gagcgtcacg 


240 


gccgagaacg 


tcatgaagtg 


gcagctcctc 


gagccgcagc 


gaggggtcta 


caactggggt 


300 


ccggccgatc 


agctcgtgcg 


cgtagccaac 


gagaacggcc 


agaaggtgcg 


cgggcacacg 


360 


ctcatctggc 


acaaccagct 


gcccacctgg cttaccagcg gagtcgcctc 


egg t gaga tc 


420 


acaccggacg 


agctccggca 


gctcctgagg 


aaccacatct 


tcacggtgat 


gcgccacttc 


480 


aagggcgaga 


tccaccagtg 


ggatgtcgcc 


aacgaggtca 


tcgacgacag 


cggcaacctg 


540 


cgcaacacga 


tctggctgca 


gaacctgggt 


ccgagctaca 


tcgcggacgc 


gttccggtgg 


600 


gctcgcaagg 


ccgacccgga 


cgccgccctc 


tatctgaacg 


actacaacgt 


cgagggcccg 


660 


aacgccaagg 


ccgatgcgta 


ctacgccctg gtcaagcagc 


tcctcgccga 


cgacgtgccg 


720 


gtggacggct 


tcggaataca 


ggggcacctc 


ggtgtgcagt 


tcggcttctg 


gcccgcgagt 


780 


gcggtggccg 


acaacatggg 


gcgcttcgag gcactcggcc 


tgcagacggc 


ggtcaccgag 


840 


gcggatgtcc 


ggatgatcat 


gccgcccgac 


gaggacaagc 


tggccgcaca 


ggcacgtggc 


900 


tacagcacgt 


tggtccaggg 


ctgcctgatg 


gccaagcgtt 


gcaggtcgtt 


caccgtctgg 


960 


ggcttcaccg 


acaagtactc 


ctgggttccg 


ggcaccttcc 


ccggccaggg 


cgcggcgaac 


1020 


ctcctggccg 


aggacttcca 


gcccaagccg 


gcttactacg 


ccgtccagga 


tgacctcgcg 


1080 


cgcgccggac 


ggtag 










1095 



<210> 114 
<211> 364 
<212> PRT 
<213> Unknown • 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAL 
<222> (1) . . . (27) 

<220> 

<221> DOMAIN 
<222> (41) . . , (359) 

<223> Glycosyl hydrolase family 10 
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<220> 

<221> SITE 

<222> (35) . . . (38) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 114 

Met Lys Val Thr Arg Thr Ala Val Ala Gly He Val Ala Ala Ala Val 
15 10 15 



Leu He Thr He Gly Thr Ser Thr Ala Ser Ala Glu Asp Glu Pro Thr 
20 25 30 



Ser Glu Asn Thr Ser Thr Asp Gin Pro Leu Arg Val Leu Ala Ala Lys 
35 40 45 



Ala Gly He Ala Phe Gly Thr Ala Val Asp Met Asn Ala Tyr Asn Asn 
50 55 60 



Asp Ala Thr Tyr Arg Glu Leu Val Gly Gin Glu Phe Ser Ser Val Thr 
65 70 75 80 



Ala Glu Asn Val Met Lys Trp Gin Leu Leu Glu Pro Gin Arg Gly Val 
85 90 95 



Tyr Asn Trp Gly Pro Ala Asp Gin Leu Val Arg Val Ala Asn Glu Asn 
100 105 110 



Gly Gin Lys Val Arg Gly His Thr Leu He Trp His Asn Gin Leu Pro 
H5 120 125 



Thr Trp'Leu Thr Ser Gly Val Ala Ser Gly Glu He Thr Pro Asp Glu 
130 135 140 



Leu Arg Gin Leu Leu Arg Asn His He Phe Thr Val Met Arg His Phe 
145 150 155 160 



Lys Gly Glu He His Gin Trp Asp Val Ala Asn Glu Val He Asp Asp 
165 170 175 



Ser Gly Asn Leu Arg Asn Thr He Trp Leu Gin Asn Leu Gly Pro Ser 
180 185 190 



Tyr He Ala Asp Ala Phe Arg Trp Ala Arg Lys Ala Asp Pro Asp Ala 
195 200 205 



Ala Leu Tyr Leu Asn Asp Tyr Asn Val Glu Gly Pro Asn Ala Lys Ala 
210 215 220 
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Asp Ala Tyr Tyr Ala Leu Val Lys Gin Leu Leu Ala Asp Asp Val Pro 
225 230 235 240 



Val Asp Gly Phe Gly He Gin Gly His Leu Gly Val Gin Phe Gly Phe 
245 250 255 



Trp Pro Ala Ser Ala Val Ala Asp Asn Met Gly Arg Phe Glu Ala Leu 
260 265 270 



Gly Leu Gin Thr Ala Val Thr Glu Ala Asp Val Arg Met He Met Pro 
275 280 285 



Pro Asp Glu Asp Lys Leu Ala Ala Gin Ala Arg Gly Tyr Ser Thr Leu 
290 295 300 



Val Gin Gly Cys Leu Met Ala Lys Arg Cys Arg Ser Phe Thr Val Trp 
305 310 315 320 



Gly Phe Thr Asp Lys Tyr Ser Trp Val Pro Gly Thr. Phe Pro Gly Gin 
325 330 335 



Gly Ala Ala Asn Leu Leu Ala Glu Asp Phe Gin Pro Lys Pro Ala Tyr 
340 345 350 



Tyr Ala Val Gin Asp Asp Leu Ala Arg Ala Gly Arg 
355 360 



<210> 115 
<211> 774 
<212> DNA 
<213> Unknown 










<220> 

<223> Obtained from environmental sample 






<400> 115 
atggacttgc 


agctaggcgg 


aaagcgcgtg 


ctgatcacgg gtgcgtccaa 


aggcatcggc 


60 


ctggcctgcg 


ccgtcgcctt 


tgcgcgcgag 


ggtgccgacc cgattctggt 


ggcgcgcgat 


120 


gatgcggcgt 


tgcatcacgc 


cacgtccgcc 


atccgcgaac aaagcggccg 


cgcggcacat 


180 


gccatcacgc 


tggacctggc 


cctgcctggc 


gcggcggaaa agctggccaa 


ggaaaccggc 


240 


cccatcgaca 


tactggtcaa 


caacgcgggc 


gcggtgcccg gcggcgcgct 


ggaccaggtg 


300 


caagacgaac 


gctggcgcgc 


gggctgggaa 


ttgaaagtgc acggctacat 


cagcctggcg 


360 


cgctgctact 


acccgcacat 


gcgcgaagcg 


ggcgcgggcg tcatcgccaa 


catcatcggc 


420 
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atggcgggcg 


cggcgccccg 


cgccgactac 


atctgcggcg 


cggcggccaa 


tgcctcactg 


480 




u u c gc y c t 


gggt-ggcgaa 


gcgccccgcc 


acggcgtgcg 


cgtctttggc 


540 


gtcaacccct 


cgcgcacgcg 


gaccgaccgc 


gtgctgaccc 


tggcccggca 


acgcgcgcag 


600 


gcgcgctggg 


gcgacgaaac 


ccgttggcag 


gaaacgctgt 


cggacctgcc 


cttcaaccgg 


660 


ctgatggaac 


ccgccgaagt 


ggccgacatg 


attgtgttcg gcgcctcgcc 


gcgcgcgggt 


720 


tacctgagcg 


gcacggtcat 


cgacctggac 


ggcggcgaac 


agtacgcgaa 


atag 


774 



<210> 116 
<211> 257 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 

<222> (8) . . . (172) 

<223> short chain dehydrogenase 

<220> 

<221> SITE 

<222> (159) . . . (162) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 116 

Met Asp Leu Gin Leu Gly Gly Lys Arg Val Leu He Thr Gly Ala Ser 
15 10 15 

Lys Gly He Gly Leu Ala Cys Ala Val Ala Phe Ala Arg Glu Gly Ala 
20 25 30 

Asp Pro He Leu Val Ala Arg Asp Asp Ala Ala Leu His His Ala Thr 
35 40 45 

Ser Ala He Arg Glu Gin Ser Gly Arg Ala Ala His Ala He Thr Leu 
50 55 60 

Asp Leu Ala Leu Pro Gly Ala Ala Glu Lys Leu Ala Lys Glu Thr Gly 
65 70 75 80 

Pro He Asp He Leu Val Asn Asn Ala Gly Ala Val Pro Gly Gly Ala 
85 90 95 

Leu Asp Gin Val Gin Asp Glu Arg Trp Arg Ala Gly Trp Glu Leu Lys 
100 105 110 



Val His Gly Tyr He Ser Leu Ala Arg Cys Tyr Tyr Pro His Met Arg 
115 120 125 
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Glu Ala Gly Ala Gly Val He Ala Asn He He Gly Met Ala Gly Ala 
130 135 140 



Ala Pro Arg Ala Asp Tyr lie Cys Gly Ala Ala Ala Asn Ala Ser Leu 
145 150 155 160 



He Ala Phe Thr Arg Ala Leu Gly Gly Glu Ala Pro Arg His Gly Val 
165 170 175 



Arg Val Phe Gly Val Asn Pro Ser Arg Thr Arg Thr Asp Arg Val Leu 
180 185 190 



Thr Leu Ala Arg Gin Arg Ala Gin Ala Arg Trp Gly Asp Glu Thr Arg 
195 200 205 



Trp Gin Glu Thr Leu Ser Asp Leu Pro Phe Asn Arg Leu Met Glu Pro 
210 215 220 



Ala Glu Val Ala Asp Met He Val Phe Gly Ala Ser Pro Arg Ala Gly 
225 230 235 240 



Tyr Leu Ser Gly Thr Val He Asp Leu Asp Gly Gly Glu Gin Tyr Ala 
245 250 255 



Lys 



<210> 117 
<211> 747 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 117 



atgcccaaag 


tcatgctcgt 


taccggcggc 


agccgtggca 


tcggcgccgc 


cgtcgccaag 


60 


ctggccgcgc 


gccgcggcta 


cgcggtcggc 


atcaactacc 


gcacccattc 


cgacgccgcc 


120 


gacgccgtcg 


tggccgagat 


ccagcaggcg 


ggcggcaccg 


cgctggccat 


ccaggccgac 


180 


gtgtcgcaag 


aagatgacgt 


gctgcacatg 


ttccgcacgc tggacgagcg cctgggccgc 


240 


atcgacgcgc 


tggtcaataa 


cgccggcatc 


ctggaaacgc 


agatgcgcct 


ggaccagatg 


300 


gaagcggacc 


gcctgctgcg 


cgtgctgtcc 


accaacgtca 


tcggcgcttt 


cctgtgtgcg 


360 


cgcgaagcgg 


tgcgcaggat 


gtcgacgcgc 


catggcggcg 


tgggcggcgc 


catcgtcaac 


420 
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gtgtcttcgg 


cggcggcgcg 


cctgggctcg 


cccaatgaat 


acgtggatta 


cgcggcctcc 


480 


acL9ggcgcgc 


tggacacgat 


gaccatcggc 


ctgtccaaag aggtagcgcc 


cgaaggtatc 


540 


cgcgtgaatg 


gcgtgcgccc 


cggcaccatc 


tacaccgaca 


tgcacgcaag 


cggcggcgag 


600 


ccgggccggg 


tggatcgcct 


gaaaagcgtg 


atcccgctgc 


ggcgcggcgg 


ctcggtggaa 


660 


gaagtggcgg 


gcgccgtcafc 


gtggctgttt 


tccgaagaag 


ccggctatac 


cagcggctcg 


720 


ttcatcgacg 


tgtccggcgg 


tagttga 








747 



<210> 118 
<211> 248 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 

<222> (3)... (176) 

<223> short chain dehydrogenase 

<220> 

<221> SITE 

<222> {142) . . . (145) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (146) . . . (174) 

<223> Short-chain dehydrogenases/reductases family signature. Prosite id = 
PS00061 

<400> 118 

Met Pro Lys Val Met Leu Val Thr Gly Gly Ser Arg Gly lie Gly Ala 
1 5 10 15 

Ala Val Ala Lys Leu Ala Ala Arg Arg Gly Tyr Ala Val Gly lie Asn 
20 25 30 



Tyr Arg Thr His Ser Asp Ala Ala Asp Ala Val Val Ala Glu lie Gin 
35 40 45 



Gin Ala Gly Gly Thr Ala Leu Ala He Gin Ala Asp Val Ser Gin Glu 
50 55 60 

Asp Asp Val Leu His Met Phe Arg Thr Leu Asp Glu Arg Leu Gly Arg 
65 70 75 80 



He Asp Ala Leu Val Asn Asn Ala Gly He Leu Glu Thr Gin Met Arg 
85 90 95 



wo 2006/101584 



179/248 



PCT/US2006/002516 



Leu Asp Gin Met Glu Ala Asp Arg Leu Leu Arg Val Leu Ser Thr Asn 
100 105 110 



Val He Gly Ala Phe Leu Cys Ala Arg Glu Ala Val Arg Arg Met Ser 
115 120 125 



Thr Arg His Gly Gly Val Gly Gly Ala He Val Asn Val Ser Ser Ala 
130 135 140 



Ala Ala Arg Leu Gly Ser Pro Asn Glu Tyr Val Asp Tyr Ala Ala Ser 
145 150 155 160 



Lys Gly Ala Leu Asp Thr Met Thr He Gly Leu Ser Lys Glu Val Ala 
165 170 175 



Pro Glu Gly He Arg Val Asn Gly Val Arg Pro Gly Thr He Tyr Thr 
180 185 190 



Asp Met His Ala Ser Gly Gly Glu Pro Gly Arg Val Asp Arg Leu Lys 
195 200 205 



Ser Val He Pro Leu Arg Arg Gly Gly Ser Val Glu Glu Val Ala Gly 
210 215 220 



Ala Val Met Trp Leu Phe Ser Glu Glu Ala Gly Tyr Thr Ser Gly Ser 
225 230 235 240 



Phe He Asp Val Ser Gly Gly Ser 
245 



<210> 119 
<211> 1611 
<212> DNA 
<213> Unknovm 

<220> 

<223> Obtained from environmental sample 
<400> 119 



atgcaaaagc 


ggtatgacgt 


cattgtcgtg 


ggcagcggga tcgccggcct cagttttgcg 


60 


ctaaaagtcg 


ccaaggcggg 


gcatcgcgta 


gggattttga ccaaaaaaga ccgtgctgaa 


120 


agcaacacca 


attatgccca 


aggcggcatc 


gcggcagtca cttcgcagac agatgatttc 


180 


gagctgcatg 


tgcaggacac 


attgaccgcg 


ggagatggac tctgcgacga ggcagtcgtc 


240 


cgcacgatta 


tcggcgaggc 


tcccgcccga 


atccaggagc tgatcgattt gggggtggcc 


300 


ttctcacatt 


tggaagatgg 


acgggtttcc 


ctccatcgcg aagggggtca ctcgaaaagg 


360 
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cgcattcttc 


acgt tcagga 


t'crtcacccroc 


aaagcgattg 


aagaagccc t 


cc tccatgcc 


420 


a. t c cf aa. c ag t 


cgccgctgat 


cgaccfcgaat 


gagcacgtct 


t tcrccat"ccra 

^ ^ y v.^ w y 


c 1 1~ t* crap t" 


480 


gaacgcaagc 


t CJCTC cjc t a cr c 


aaactttaaa 


atCTcyaaaata 

y ^y y "-"-yy ^y 


ctaaaaacccr 


cr cr t crcr t c cr era 

yy ^yy wv^yyc*. 


540 


ctctatgcgc 


tcgatgaagc 


cac tcaggag 


gttcacgtat 


ttgaggctcc 


acrtccrtcatQ 


600 


ctggcaacgg 


cr a crcr c cr t c cr d" 


gcaggtc tac 


ctctacagca 


ccaacGcaag 


era tCQcaaGc 


660 


agtCTataaat 


tggccatggc 


ttaccaacrct 


aaccrcccraaa 


tccgcaacct 


caaatcrtatc 


720 


caatttcatc 


ctacagcgct 


atacaccacc 


accaatgacc 


gctttctgat 


cagcgaagc c 


780 


gtccggggtg 


aaggggccat 


cctccgcaat 


caggagggaa 


9^99ctttcat 


ggctcgctac 


840 


gatgaccgca 


aggacctcgc 


cccccgggat 


attgtggcca 


gagcaattga 


cagtgaaatg 


900 


aagcagtccg 


gctcatcGca 


tgtctggctc 


gacatcactc 


a t c crcr era t era 

'-'■^^^yyy^^^y^ 


aaccgatctg 


960 


ccraoaaccrtt 

^ ^ ^ (.iHLV^ ViP ^ W 


tccccaacat 


t h t ccracrcrcc 

w w w w y y ^ ^ 


t~ err' r* t* ct^? rrrr 

^ y ^ w ^ y ^^CiiLy y 


t" r'fTftArr^T'a^ 
Va>v^yyM.y w^mci 


^ y y i^y wCLCi 


1020 


tcctccatcc 


ccrcrhcrcrh tec 


crcfcoatcrcac 




rrs (*TfTr*rr1~ a rrr* 
yciyyv.»y uay^ 




1080 


aa t gcggc ca 


ccgacatcac 


tcrcractcrttt 

^y y Cp4r w w y w w 


y L-i- uy uyyy y 


3 PI erf" t" fTr* r* frr 


^ci^y yy au- i-y 


1140 


catcratcrcca 


accgtctcgc 


' cagcaacagc 


^L^y^^uyyciyy 


r" A frt" rrrrt* t* 
^cLy v^yy L>^cii« 


rrprpiTPA pprrrT 

y y '^'3 ^^^'-^y y 


1200 


gcctccgtcg 


c a cr t crcr a t cr c 


atacctcaac 


agcaaacc tc 


riCCCICf' Pi \' CIC 
Vw»y v-^ o i^y w 


3 r* a a t~ i" CIC* r* rr 
cLv^cLCi ^ ^y ^^y 


1260 


gaa t gga c gg 


ataacaaccrt 


cr c acr era c a Q c 


era c cr a cf c Cf t cr 

ywv.>yaiy^y i-*y 


1" r* rr 1~ rr i" rr 
L-v^y L>y di^^ciy 


^- cL del \«- ^y y 


1320 


gatgaactca 


aacgcacgat 


gtgggactac 


gtgggcatcg 


tccgcaccac 


caagcggctt 


1380 


cagcgcgcgc 


aacgacgcat 


tcgtcacctc 


cagcaggaaa 


tcgaagagta 


ttactggaat 


1440 


ttcaaggttg 


agtcctccct 


tctggagtta 


cggaatctgg 


ttgtggtggc 


ggatctggtt 


1500 


atccactgtg 


ccctccaacg 


ccatgagagc 


cgtggcctgc 


attgcacccg 


ggattatccc 


1560 


ggcaagttgc 


ccaccccgat 


caataccgcc 


gttcgcagaa 


gaaccggtta 


a 


1611 



<210> 120 
<211> 536 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (6) . . . (260) 

<223> FAD dependent oxidoreductase 
<220> 

<221> DOMAIN 
<222> (6) , . . (380) 

<223> Pyridine nucleotide-disulphide oxidoreductase 
<220> 
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<221> DOMAIN 

<222> (6) . . . (391) 

<223> FAD binding domain 

<220> 

<221> DOMAIN 

<222> (440) . . . (534) 

<223> Fuinarate reductase/succinate dehydrogenase flavoprotein C-terminal 
domain 

<400> 120 

Met Gin Lys Arg Tyr Asp Val He Val Val Gly Ser Gly He Ala Gly 
1 5 10. 15 



Leu Ser Phe Ala Leu Lys Val Ala Lys Ala Gly His Arg Val Gly He 
20 25 30 



Leu Thr Lys Lys Asp Arg Ala Glu Ser Asn Thr Asn Tyr Ala Gin Gly 
35 40 45 



Gly He Ala Ala Val Thr Ser Gin Thr Asp Asp Phe Glu Leu His Val 
50 55 60 



Gin Asp Thr Leu Thr Ala Gly Asp Gly Leu Cys Asp Glu Ala Val Val 
65 70 75 80 



Arg Thr He He Gly Glu Ala Pro Ala Arg lie Gin Glu Leu He Asp 
85 90 95 



Leu Gly Val Ala Phe Ser His Leu Glu Asp Gly Arg Val Ser Leu His 
100 105 110 



Arg Glu Gly Gly His Ser Lys Arg Arg He Leu His Val Gin Asp Val 
115 120 125 



Thr Gly Lys Ala He Glu Glu Ala Leu Leu His Ala He Glu Gin Ser 
130 135 140 



Pro Leu He Asp Leu Asn Glu His Val Phe Ala He Asp Leu Leu Thr 
145 150 155 160 



Glu Arg Lys Leu Ala Leu Ala Gly Phe Glu Val Glu Gly Ala Lys Asn 
165 170 175 



Arg Val Val Gly Leu Tyr Ala Leu Asp Glu Ala Thr Gin Glu Val His 
180 185 190 



Val Phe Glu Ala Pro Val Val Met Leu Ala Thr Gly Gly Val Gly Gin 
195 200 205 
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Val Tyr Leu Tyr Ser Thr Asn Pro Arg He Ala Thr Gly Asp Gly Leu 
210 215 220 



Ala Met Ala Tyr Arg Ala Gly Ala Glu He Arg Asn Leu Glu Cys He 
225 230 235 240 



Gin Phe His Pro Thr Ala Leu Tyr Thr Thr Thr Asn Asp Arg Phe Leu 
245 250 255 



He Ser Glu Ala Val Arg Gly Glu Gly Ala He Leu Arg Asn Gin Glu 
260 265 270 



Gly Glu Ala Phe Met Ala Arg Tyr Asp Asp TVrg Lys Asp Leu Ala Pro 
275 280 285 



Arg Asp He Val Ala Arg Ala He Asp Ser Glu Met Lys Gin Ser Gly 
290 295 300 



Ser Ser His Val Trp Leu Asp He Thr His Arg Asp Glu Thr Asp Leu 
305 310 315 320 



Arg Glu Arg Phe Pro Asn He Phe Glu Ala Cys Leu Lys Val Gly Val 
325 330 335 



Asn Met Ala Gin Ser Ser He Pro Val Val Pro Ala Met His Tyr Leu 
340 345 350 



Cys Gly Gly Val Ala Thr Asp Leu Asn Ala Ala Thr Asp He Thr Gly 
355 360 365 



Leu Phe Ala Cys Gly Glu Val Ala Cys Thr Gly Leu His Gly Ala Asn 
370 375 380 



Arg Leu Ala Ser Asn Ser Leu Leu Glu Ala Val Val Met Ala His Arg 
385 390 395 400 



Ala Ser Val Ala Val Asp Ala Tyr Leu Asn Ser Lys Pro His Arg Tyr 
405 410 415 



Ala Gin Leu Pro Glu Trp Thr Asp Gly Asn Val Gin Asp Ser Asp Glu 
420 425 430 



Arg Val Val He Ser His Asn Trp Asp Glu Leu Lys Arg Thr Met Trp 
435 440 445 
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Asp Tyr Val Gly lie Val Arg Thr Thr Lys Arg Leu Gin Arg Ala Gin 
450 455 460 

Arg Arg He Arg His Leu Gin Gin Glu He Glu Glu Tyr Tyr Trp Asn 
465 470 475 480 

Phe Lys Val Glu Ser Ser Leu Leu Glu Leu Arg Asn Leu Val Val Val 
485 490 495 

Ala Asp Leu Val He His Cys Ala Leu Gin Arg His Glu Ser Arg Gly 
500 505 510 

Leu His Cys Thr Arg Asp Tyr Pro Gly Lys Leu Pro Thr Pro He Asn 
515 520 525 

Thr Ala Val Arg Arg Arg Thr Gly 
530 535 

<210> 121 
<211> 990 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 



<400> 121 
atgccttttg 


atgccattgg 


agaaagcttc 


cgtgccagcc 


agcaactccc 


gctgatcaag 


60 


gtcgacggca 


accgtttcgt 


gattgcggag 


accggtgagc 


cgatcgtctt 


ccggggcgtc 


120 


tccgcctccg 


acccggctgc 


gctactggaa 


cgcggtcaat 


ggggtcgccg 


ttactttgaa 


180 


gagatggcca 


agtggaatgc 


caacgttgtg 


cgcattcctg 


ttcacccggc 


agactggcgt 


240 


aatctcggcg 


aagacatcta 


tctcgcccta 


ctcgaccagg 


cgattgaatg 


gtcggctgaa 


300 


ctcggcatgc 


acgtcatcat 


cgactggcac 


actatcggca atattctgac 


cggtatttat 


360 


caccgcgaca 


tttatgaaac 


cacccgtgat 


gagacttacc 


gtttttggta 


caccatcgcc 


42 0 


attcgttatc 


agggtaaccc 


gacagtggcc 


ttttatgaac 


tctacaatga 


gcccaccaac 


480 


cgaggcggtc 


gcatgggccc 


ccttccctgg 


gaagaatatg 


cccagttcat 


cgaagggctg 


540 


atttccatgc 


tctacgccat 


cgacgacacc 


gttattccac 


tggtcgctgg 


cttcgactgg 


600 


.ggatatgatt 


tgagctatgt 


tgcggaacgc 


ccgatccgtt 


ttccaggagt 


cgcctatgtc 


660 


acccaccctt 


acccgcagaa 


gcgccccgag 


ccttgggaac 


cgatctggca 


ggaggaatgg 


720 


ggttttgtcg 


ccgacaccta 


tcccatgatc 


gccactgagt 


ttggcttcat 


gagtgaggac 


780 


ggtcccggag 


cccacaaccc 


ggttatcggg 


gatgaacact 


atggcgaatc 


ggtcatccgc 


840 
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tttttcgagg aacgcggcat ttcctggacg gcctgggtgt ttgatcctct ctggtcaccc 900 

cagcttttcg aagactggga aacctatacc cccacccggc aaggccgatt ctttaaacag 960 

aaaatgatgg aactgaatcc cccgcgttga 990 

<210> 122 
<211> 329 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (25) . . . (302) 

<223> Cellulase (glycosyl hydrolase family 5) 
<400> 122 

Met Pro Phe Asp Ala lie Gly Glu Ser Phe Arg Ala Ser Gin Gin Leu 
1 5 10 15 



Pro Leu lie Lys Val Asp Gly Asn Arg Phe Val lie Ala Glu Thr Gly 
20 25 30 



Glu Pro lie Val Phe Arg Gly Val Ser Ala Ser Asp Pro Ala Ala Leu 
35 40 45 



Leu Glu Arg Gly Gin Trp Gly Arg Arg Tyr Phe Glu Glu Met Ala Lys 
50 55 60 



Trp Asn Ala Asn Val Val Arg lie Pro Val His Pro Ala Asp Trp Arg 
65 70 75 80 



Asn Leu Gly Glu Asp lie Tyr Leu Ala Leu Leu Asp Gin Ala lie Glu 
85 90 95 



Trp Ser Ala Glu Leu Gly Met His Val lie lie Asp Trp His Thr lie 
100 105 110- 



Gly Asn lie Leu Thr Gly lie Tyr His Arg Asp He Tyr Glu Thr Thr 
115 120 125 



Arg Asp Glu Thr Tyr Arg Phe Trp Tyr Thr He Ala He Arg Tyr Gin 
130 135 140 



Gly Asn Pro Thr Val Ala Phe Tyr Glu Leu Tyr Asn Glu Pro Thr Asn 
145 150 155 160 
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Arg Gly Gly Arg Met Gly Pro Leu Pro Trp Glu Glu Tyr Ala Gin Phe 
165 170 175 



He Glu Gly Leu He Ser Met Leu Tyr Ala He Asp Asp Thr Val He 
180 185 190 



Pro Leu Val Ala Gly Phe Asp Trp Gly Tyr Asp Leu Ser Tyr Val Ala 
195 200 205 



Glu Arg Pro He Arg Phe Pro Gly Val Ala Tyr Val Thr His Pro Tyr 
210 215 220 



Pro Gin Lys Arg Pro Glu Pro Trp Glu Pro He Trp Gin Glu Glu Trp 

225 230 235 240 

Gly Phe Val Ala Asp Thr Tyr Pro Met He Ala Thr Glu Phe Gly Phe 

245 250 255 



Met Ser Glu Asp Gly Pro Gly Ala His Asn Pro Val He Gly Asp Glu 
260 265 270 

His Tyr Gly Glu Ser Val He Arg Phe Phe Glu Glu Arg Gly He Ser 
275 280 285 



Trp Thr Ala Trp Val Phe Asp Pro Leu Trp Ser Pro Gin Leu Phe Glu 
290 295 300 



Asp Trp Glu Thr Tyr Thr Pro Thr Arg Gin Gly Arg Phe Phe Lys Gin 
305 310 315 320 



Lys Met Met Glu Leu Asn Pro Pro Arg 
325 



<210> 123 
<211> 1398 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 123 

atgccgatga gcacagaaac gacttttcct tctgatttca cctggggcgc agcaacagcc 60 
gcctaccaga tcgaaggggg cgatcgcgct ggcgggcgcg gccgttccgt gtgggacatg 120 
ttttgcgaga aacgaggagc tatttgggag gggcatacgg ggcagcgagc gagtctgcat 180 
cttcagcgct ggcgtgagga cgtaatgttg atgcaacagc tcggactgcg gggctatcgt 240 
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tttagcgtca 


gctggccgcg 


cgtcttcccg 


acaggagtcg gcaaagtcaa ccgtgaaggg 


300 


ttggcctttt 


acgatcagct 


cgtagacgcc 


ttgctcgagg 


ccggcatcac 


cccctttata 


360 


acgctatttc 


attgggactt 


cccgctcgat 


ttgtaccacc 


gaggcggctg 


gttgaatcgc 


420 


gacagcgccg 


actggtttgc 


ctcctacgcc 


gagtgcctcg 


gcaaggcact 


gggcgacagg 


480 


gtcaagcact 


gggtgaccct 


caacgagccg 


caggttttca 


taggcctcgg 


tcattacgaa 


540 


gggcgtcatg 


ccccggggtt 


gaagctctcc 


atcgcggaaa 


tgctgcgctg 


cgggcaccac 


600 


gccttgctcg 


cgcacgggaa 


ggccgtgcaa 


gccctgcgcg 


cttccgtcga 


cggcccctgc 


660 


aagattggat 


ttgctccggt 


ggggattccc 


aagcttccgg 


cgagtgagag 


ctcagaggat 


720 


atcgccgcgg 


cacgaaaggc 


ccagttcgcg 


gcgggagcgc 


cgccgtattg 


gacgctgagc 


780 


tggtgggcgg 


atccggtgtt 


tcaggggaca 


tatcccgctg atgcctgcca 


ggctctcgga 


840 


gcggacgcgc 


cgcaggtggc 


cgatcacgac 


atgagcatca 


tcagcgagcc 


gactgatttc 


900 


ctgggcctca 


acctttatca 


aggggtggtg 


gtgcgtgccg 


atcacacggg 


tcaaccagaa 


960 


acggtgccgt 


ttccgccggg 


attccccgtg 


actgcgctca 


actgggccgt 


aaccccagag 


1020 


gcgctgtatt 


ggggcccgcg 


ctttgccttc 


gaacgctaca 


aaaagccgat 


tcacatcacg 


1080 


gaaaacgggc 


tatcctgtcg 


tgactggccg 


tcgctcgacg 


ggcacgtcca 


cgacgccgac 


1140 


cgcatcgact 


h ca. t CTQC c ccr 


crcacttCTCorc 


gcagcgcatc 


gagccattcg cgatgggata 


1200 


ccgatcgaag 


gctacttcca 


ctggtctgcg 


atcgacaact 


tcgagtgggc 


agaaggctac 


1260 


aaggaacgct 


tcgggctcat 


ttacgtcgac 


tatacgagcg gcgagcgcat 


tccgaaggac 


1320 


tcgtaccact 


ggtaccagaa 


ggtcattgcc 


tccgaggggc 


gggcagcgct 


cggcgcgccc 


1380 


agtgctgctc 


gcccataa 










1398 



<210> 124 
<211> 465 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 

<222> (5) . . , (454) 

<223> Glycosyl hydrolase family 1 

<220> 

<221> SITE 

<222> (13) . . . (27) 

<223> Glycosyl hydrolases family 1 N-terminal signature, Prosite id = 
PS00653 

<400> 124 

Met Pro Met Ser Thr Glu Thr Thr Phe Pro Ser Asp Phe Thr Trp Gly 
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10 
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15 



Ala Ala Thr Ala Ala Tyr Gin lie Glu Gly Gly Asp Arg Ala Gly Gly 
20 25 30 



Arg Gly Arg Ser Val Trp Asp Met Phe Cys Glu Lys Arg Gly Ala lie 
35 40 45 



Trp Glu Gly His Thr Gly Gin Arg Ala Ser Leu His Leu Gin Arg Trp 
50 55 60 



Arg Glu Asp Val Met Leu Met Gin Gin Leu Gly Leu Arg Gly Tyr Arg 
65 70 75 80 



Phe Ser Val Ser Trp Pro Arg Val Phe Pro Thr Gly Val Gly Lys Val 
85 90 95 



Asn Arg Glu Gly Leu Ala Phe Tyr Asp Gin Leu Val Asp Ala Leu Leu 
100 105 110 



Glu Ala Gly lie Thr Pro Phe He Thr Leu Phe His Trp Asp Phe Pro 
115 120 125 



Leu Asp Leu Tyr His Arg Gly Gly Trp Leu Asn Arg Asp Ser Ala Asp 
130 135 140 



Trp Phe Ala Ser Tyr Ala Glu Cys Leu Gly Lys Ala Leu Gly Asp Arg 

150 155 160 



Val Lys His Trp Val Thr Leu Asn Glu Pro Gin Val Phe He Gly Leu 
165 170 175 



Gly His Tyr Glu Gly Arg His Ala Pro Gly Leu Lys Leu Ser He Ala 
180 185 190 



Glu Met Leu Arg Cys Gly His His Ala Leu Leu Ala His Gly Lys Ala 
195 200 205 



Val Gin Ala Leu Arg Ala Ser Val Asp Gly Pro Cys Lys He Gly Phe 
210 215 220 



Ala Pro Val Gly He Pro Lys Leu Pro Ala Ser Glu Ser Ser Glu Asp 
225 230 235 240 



He Ala Ala Ala Arg Lys Ala Gin Phe Ala Ala Gly Ala Pro Pro Tyr 
245 250 255 
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Trp Thr Leu Ser Trp Trp Ala Asp Pro Val Phe Gin Gly Thr Tyr Pro 
260 265 270 



Ala Asp Ala Cys Gin Ala Leu Gly Ala Asp Ala Pro Gin Val Ala Asp 
275 280 285 



His Asp Met Ser He He Ser Glu Pro Thr Asp Phe Leu Gly Leu Asn 
290 295 300 



Leu Tyr Gin Gly Val Val Val Arg Ala Asp His Thr Gly Gin Pro Glu 
305 310 315 320 



Thr Val Pro Phe Pro Pro Gly Phe Pro Val Thr Ala Leu Asn Trp Ala 
325 330 335 



Val Thr Pro Glu Ala Leu Tyr Trp Gly Pro Arg Phe Ala Phe Glu Arg 
340 345 350 



Tyr Lys Lys Pro He His He Thr Glu Asn Gly Leu Ser Cys Arg Asp 
355 360 365 



Trp Pro Ser Leu Asp Gly His Val His Asp Ala Asp' Arg He Asp Phe 
370 375 380 



Met Ala Arg His Leu Arg Ala Ala His Arg Ala He Arg Asp Gly He 
385 390 395 400 



Pro He Glu Gly Tyr Phe His Trp Ser Ala He Asp Asn Phe Glu Trp 
405 410 415 



Ala Glu Gly Tyr Lys Glu Arg Phe Gly Leu He Tyr Val Asp Tyr Thr 
420 425 430 



Ser Gly Glu Arg He Pro Lys Asp Ser Tyr His Trp Tyr Gin Lys Val 
435 440 445 



He Ala Ser Glu Gly Arg Ala Ala Leu Gly Ala Pro Ser Ala Ala Arg 
450 455 460 



Pro 
465 



<210> 125 
<211> 1350 
<212> DNA 
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<213> Unknown 
<220> 

<223> Obtained from environmental sample 



<400> 125 



atgtcagatg 


ccgccccgac 


tgatccgaaa 


tccgcaatgc 


ccagacgctc 


ggacttcccc 


60 


gagggttttg 


tcttcggcgc 


ggccaccgcg 


gcctatcaga 


tcgagggcca 


tgccttcggc 


120 


ggcgcgggcc 


cctgccattg 


ggacagcttc 


gccgcaaccg 


ggcgtaacgt 


ggtcggcaat 


180 


gaggatggcg 


cgcgcgcctg 


cgagcattac 


acccgctggc 


cgcaggatct 


ggacctgatc 


240 


cgcgaggccg 


ggctcgacgc 


ctaccgcttc 


tcgacctcct 


gggcgcgggt 


gatgcccgat 


300 


ggcgtgaccc 


tgaaccccga 


ggggctggat 


ttctacgacc 


gcctcgtcga 


tggcatgctc 


360 


gagcgcgggc 


taaagcccta 


tctcaccctc 


taccattggg aattgccctc ggcgcttgcc 


420 


gacaggggcg 


gctggaccaa 


tcgcgacacg 


gccgagcgct 


ttgccgattt 


cgcagcggtg 


480 


gtgatggagc 


ggttgggcag 


ccgcgtcgcc 


cgcacggcca 


ccatcaacga 


crccatcratcrc 


540 


gtgagctggc 


tctcgcattt 


cgaaggccat 


cacgcgccgg 


gcctgcgcga 


catccgtgcc 


600 


accgcacgcg 


ccatgcatca 


tgtgcaactg 


gcgcacggcc 


tcgcgctcgg 


gaagctgcgc 


660 


gcgcaggggc 


atggcaatct 


cggcatcgtg 


ctgaatttct 


cggaaatcat 


tcccgccggg 


720 


cgagagcacg 


cgaaggcggc 


tgatctcggc 


gacgcaatct 


cgaaccgctg 


gttcatcgag 


780 


tcagtcgcgc 


gtggcaccta 


tcccgatgtg 


gtcctcgagg 


gtctgggcaa 


gcacatgccc 


840 


gagggctggc 


aggatgacat 


gaaaaccatc 


gcggccccgc 


tcgactggct 


gggtgtgaac 


900 


tactacaccc 


gcggcatcgt 


cgcgcatgac 


ccggacgcgt 


cctggccctc 


gacccgagcg 


960 


gaggaggggc 


ccctgcccaa 


gacgcagatg 


ggctgggaga 


tctaccccga 


gggcttgcgc 


1020 


aacctgctgg 


tgcgcatggc 


gcgcgactat 


gtgggcgacc 


ttcccatggt 


cgtgaccgaa 


1080 


aacgggatgg 


cctgggccga 


cgaggtcgcg 


gatggcgccg 


tcagagatac 


gatccgcacc 


1140 


gaatatgtcg 


cagcccatct 


caacgcgacc 


cgcgaggcgc 


tggccggcgg 


ggcgaatatc 


1200 


gaaggtttct 


tctattggtc 


gctgctcgac 


aattacgaat 


gggccttcgg 


ctatgccaag 


1260 


cgcttcggcc 


tcgtccatgt 


cgatttcgac 


acgatggcac 


gcacgccgaa 


agcctcctac 


1320 


cacgcgctga 


gggccgcgct 


gcagggttga 








1350 



<210> 126 
<211> 449 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 
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<221> DOMAIN 

<222> (15) . . . (443) 

<223> Glycosyl hydrolase family 1 

<220> 

<221> SITE 

<222> (235) . . . (238) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (361) . . . (369) 

<223> Glycosyl hydrolases family 1 active site. Prosite id = PS00572 
<220> 

<221> SITE 

<222> (393) . . . (396) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 126 

Met Ser Asp Ala Ala Pro Thr Asp Pro Lys Ser Ala Met Pro Arg Arg 
15 10 15 



Ser Asp Phe Pro Glu Gly Phe Val Phe Gly Ala Ala Thr Ala Ala Tyr 
20 25 30 



Gin He Glu Gly His Ala Phe Gly Gly Ala Gly Pro Cys His Trp Asp 
35 40 45 



Ser Phe Ala Ala Thr Gly Arg Asn Val Val Gly Asn Glu Asp Gly Ala 
50 55 60 



Arg Ala Cys Glu His Tyr Thr Arg Trp Pro Gin Asp Leu Asp Leu He 
65 ,70 . 75 . 80 



Arg Glu Ala Gly Leu Asp Ala Tyr Arg Phe Ser Thr Ser Trp Ala Arg 
85 90 95 



Val Met Pro Asp Gly Val Thr Leu Asn Pro Glu Gly Leu Asp Phe Tyr 
100 105 110 



Asp Arg Leu Val Asp Gly Met Leu Glu Arg Gly Leu Lys Pro Tyr Leu 
115 120 125 



Thr Leu Tyr His Trp Glu Leu Pro Ser Ala Leu Ala Asp Arg Gly Gly 
130 135 140 



Trp Thr Asn Arg Asp Thr Ala Glu Arg Phe Ala Asp Phe Ala Ala Val 
145 150 155 160 



Val Met Glu Arg Leu Gly Ser Arg Val Ala Arg , Thr Ala Thr He Asn 



wo 2006/101584 

165 



191/248 

170 



PCT/US2006/002516 

175 



Glu Pro Trp Cys Val Ser Trp Leu Ser His Phe Glu Gly His His Ala 
180 185 190 



Pro Gly Leu Arg Asp lie Arg Ala Thr Ala Arg Ala Met His His Val 
195 200 205 



Gin Leu Ala His Gly Leu Ala Leu Gly Lys Leu Arg Ala Gin Gly His 
210 215 220 



Gly Asn Leu Gly lie Val Leu Asn Phe Ser Glu lie lie Pro Ala Gly 
225 230 235 240 



Arg Glu His Ala Lys Ala Ala Asp Leu Gly Asp Ala lie Ser Asn Arg 
245 250 255 



Trp Phe He Glu Ser Val Ala Arg Gly Thr Tyr Pro Asp Val Val Leu 
260 265 270 



Glu Gly Leu Gly Lys His Met Pro Glu Gly Trp Gin Asp Asp Met Lys 
275 280 285 



Thr He Ala Ala Pro Leu Asp Trp Leu Gly Val Asn Tyr Tyr Thr Arg 
290 295 300 



Gly He Val Ala His Asp Pro Asp Ala Ser Trp Pro Ser Thr Arg Ala 
305 310 315- 320 



Glu Glu Gly Pro Leu Pro Lys Thr Gin Met Gly Trp Glu He Tyr Pro 
325 330 335 



Glu Gly Leu Arg Asn Leu Leu Val Arg Met Ala Arg Asp Tyr Val Gly 
340 345 350 



Asp Leu Pro Met Val Val Thr Glu Asn Gly Met Ala Trp Ala Asp Glu 
355 360 365 



Val Ala Asp Gly Ala Val Arg Asp Thr He Arg Thr Glu Tyr Val Ala 
370 375 380 



Ala His Leu Asn Ala Thr Arg Glu Ala Leu Ala Gly Gly Ala Asn He 
385 390 395 400 



Glu Gly Phe Phe Tyr Trp Ser Leu Leu Asp Asn Tyr Glu Trp Ala Phe 
405 410 415 
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Gly Tyr Ala Lys Arg Phe Gly Leu Val His Val Asp Phe Asp Thr Met 
420 425 430 



Ala Arg Thr Pro Lys Ala Ser Tyr His Ala Leu Arg Ala Ala Leu Gin 
435 440 445 



Gly 



<210> 127 
<211> 774 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 127 



atggacttgc 


agctaggcgg 


aaagcgcgtg 


ctgatcacgg 


gtgcgtccaa 


aggcatcggc 


60 


ctggcctgcg 


ccgtcgcctt 


tgcgcgcgag 


ggtgccgacc 


cgattctggt 


ggcgcgcgat 


120 


gatgcggcgt 


tgcatcacgc 


cacgtccgcc 


atccgcgaac 


aaagcggccg 


cgcggcacat 


180 


gccatcacgc 


tggacctggc 


cctgcctggc 


gcggcggaaa 


agctggccaa 


ggaaaccggc 


240 


cccatcgaca 


tactggtcaa 


caacgcgggc 


gcggtgcccg 


gcggcgcgct 


ggaccaggtg 


300 


caagacgaac 


gctggcgcgc 


gggctgggaa 


ttgaaagtgc 


acggctacat 


cagcctggcg 


360 


cgctgctact 


acccgcacat 


gcgcgaagcg 


ggcgcgggcg 


tcatcgccaa 


catcatcggc 


420 


atggcgggcg 


cggcgccccg 


cgccgactac 


atctgcggcg 


cggcggccaa 


tgcctcactg 


480 


attgccttta 


cccgcgcgct 


gggtggcgaa 


gcgccccgcc 


acggcgtgcg 


cgtctttggc 


540 


gtcaacccct 


cgcgcacgcg 


gaccgaccgc 


gtgctgaccc 


tggcccggca 


acgcgcgcag 


600 


gcgcgctggg 


gcgacgaaac 


gcgttggcag 


gaaacgctgt 


cggacctgcc 


cttcaaccgg 


660 


ctgatggaac 


ccgccgaagt 


ggccgacatg 


attgtgttcg 


gcgcctcgcc 


acgcgcgggt 


720 


tacctgagcg 


gcacggtcat 


cgacctggac 


ggcggcgaac 


agtacgcgaa atag 


774 



<210> 128 
<211> 257 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 

<222> (8) , . . (172) 

<223> short chain dehydrogenase 
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<220> 

<221> SITE 

<222> (159) . . . (162) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 128 

Met Asp Leu Gin Leu Gly Gly Lys Arg Val Leu lie Thr Gly Ala Ser 
1 5 10 15 



Lys Gly lie Gly Leu Ala Cys Ala Val Ala Phe Ala Arg Glu Gly Ala 
20 25 30 



Asp Pro lie Leu Val Ala Arg Asp Asp Ala Ala Leu His His Ala Thr 
35 40 45 



Ser Ala He Arg Glu Gin Ser Gly Arg Ala Ala His Ala He Thr Leu 
50 55 60 



Asp Leu Ala Leu Pro Gly Ala Ala Glu Lys Leu Ala Lys Glu Thr Gly 
65 70 75 80 



Pro He Asp He Leu Val Asn Asn Ala Gly Ala Val Pro Gly Gly Ala 
85 90 95 



Leu Asp Gin Val Gin Asp Glu Arg Trp Arg Ala Gly Trp Glu Leu Lys 
100 105 110 



Val His Gly Tyr He Ser Leu Ala Arg Cys Tyr Tyr Pro His Met Arg 
115 120 125 



Glu Ala Gly Ala Gly Val He Ala Asn He He Gly Met Ala Gly Ala 
130 135 140 



Ala Pro Arg Ala Asp Tyr He Cys Gly Ala Ala Ala Asn Ala Ser Leu 
145 150 155 160 



He Ala Phe Thr Arg Ala Leu Gly Gly Glu Ala Pro Arg His Gly Val 
165 170 175 



Arg Val Phe Gly Val Asn Pro Ser Arg Thr Arg Thr Asp Arg Val Leu 
180 185 190 

Thr Leu Ala Arg Gin Arg Ala Gin Ala Arg Trp Gly Asp Glu Thr Arg 
155 200 205 



Trp Gin Glu Thr Leu Ser Asp Leu Pro Phe Asn Arg Leu Met Glu Pro 
210 215 220 
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Ala Glu Val Ala Asp Met lie Val Phe Gly Ala Ser Pro Arg Ala Gly 
225 230 235 240 



Tyr Leu Ser Gly Thr Val He Asp Leu Asp Gly Gly Glu Gin Tyr Ala 
245 250 255 



Lys 



<210> 129 
<211> 747 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 129 



atgcccaaag 


tcatgctcgt 


taccggcggc 


agccgtggca 


tcggcgccgc 


cgtcgccaag 


60 


ctggccgcgc 


gccgcggcta 


cgcggtcggc 


atcaactacc 


gcacccattc 


cgacgccgcc 


120 


gacgccgtcg 


tggccgaaat 


ccagcaggcg 


ggcggcaccg 


cgctggccat 


ccaggccgac 


180 


gtgtcgcagg 


aagacgatgt 


gctgcacatg 


ttccgcacgc 


tggacgagcg 


cctgggccgc 


240 


atcgacgcgc 


tggtcaataa 


cgccggcatc 


ctggaaacgc 


agatgcgcct 


ggaccagatg 


300 


gaagccgacc 


gcctgctgcg 


cgtgctgtcc 


accaacgtca 


tcggcgcttt 


cctatgtgcg 


360 


cgcgaagccg 


tgcgcaggat 


gtcgacgcgc 


catggcggcg 


tgggcggcgc 


catcgtcaac 


420 


gtgtcttcgg 


cggcggcgcg 


cctgggctcg 


cccaatgaat 


acgtggatta 


cgcggcctcc 


480 


aagggcgcgc 


tggacacgat 


gaccatcggc 


ctgtcgaaag aggtggcgcc 


cgaaggtatc 


540 


cgcgtgaatg 


gcgtgcgccc 


cggcaccatc 


tacaccgaca 


tgcacgcaag 


cggcggcgag 


600 


ccgggccggg 


tggatcgcct 


gaaaagcgtg 


atcccgctgc 


ggcgcggcgg 


ctcggtggaa 


660 


gaagtggcgg 


gcgccgtcat 


gtggctgttt 


tccgaagaag ccggctatac 


cagcggttcg 


720 


ttcatcgacg 


tgtccggcgg 


tagttga 








747 



<210> 130 
<211> 248 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 

<222> (3) . . . (176) 

<223> short chain dehydrogenase 
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<220> 

<221> SITE 

<222> (142) . . . (145) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (146) . . . (174) 

<223> Short-chain dehydrogenases /reductases family signature. Prosite id 
PS00061 

<400> 130 

Met Pro Lys Val Met Leu Val Thr Gly Gly Ser Arg Gly He Gly Ala 
15 10 15 

Ala Val Ala Lys Leu Ala Ala Arg Arg Gly Tyr Ala Val Gly He Asn 
20 25 30 

Tyr Arg Thr His Ser Asp Ala Ala Asp Ala Val Val Ala Glu He Gin 
35 40 45 

Gin Ala Gly Gly Thr Ala Leu Ala He Gin Ala Asp Val Ser Gin Glu 
50 55 60 

Asp Asp Val Leu His Met Phe Arg Thr Leu Asp Glu Arg Leu Gly Arg 
65 70 75 80 

He Asp Ala Leu Val Asn Asn Ala Gly He Leu Glu Thr Gin Met Arg 
85 90 95 



Leu Asp Gin Met Glu Ala Asp Arg Leu Leu Arg Val Leu Ser Thr Asn 
100 105 110 



Val He Gly Ala Phe Leu Cys Ala Arg Glu Ala Val Arg Arg Met Ser 
115 120 125 



Thr Arg His Gly Gly Val Gly Gly Ala He Val Asn Val Ser Ser Ala 
130 135 140 



Ala Ala Arg Leu Gly Ser Pro Asn Glu Tyr Val Asp Tyr Ala Ala Ser 
145 150 155 160 



Lys Gly Ala Leu Asp Thr Met Thr He Gly Leu Ser Lys Glu Val Ala 
165 170 175 



Pro Glu Gly He Arg Val Asn Gly Val Arg Pro Gly Thr He Tyr Thr 
180 185 190 



wo 2006/101584 PCT/US2006/002516 

196/248 

Asp Met His Ala Ser Gly Gly Glu Pro Gly Arg Val Asp Arg Leu Lys 
195 200 205 

Ser Val lie Pro Leu Arg Arg Gly Gly Ser Val Glu Glu Val Ala Gly 
210 215 220 

Ala Val Met Trp Leu Phe Ser Glu Glu Ala Gly Tyr Thr Ser Gly Ser 
225 230 235 240 

Phe lie Asp Val Ser Gly Gly Ser 
245 

<210> 131 
<211> 1041 
<212> DNA 
<213> Unknovm 

<220> 

<223> Obtained from environmental sample 
<400> 131 



gtggaaacct 


attttcccct 


gcaccgcggg 


atcaacatga 


gccactggct 


ttcgcaagtg 


60 


aatgaaaaca 


ttcccgaccg 


ttccacctat 


gtgacggagc 


gggacctgca 


atttttgcgg 


120 


gcagcgggct 


tcgaccatgt 


gcgtctgccg 


atcgatgaga 


tcgaactctg 


ggatgaggag 


180 


ggccatcaga 


tcgaggaggc 


ctggcaatac 


atgcacaact 


ttatgcgctg 


gagccgaaag 


240 


aatgacctcc 


gggttattct 


cgacctgcac 


acggtattgt 


cccaccactt 


caacgcgatc 


300 


aacatgggag 


aggtcaacac 


cctctttaat 


gatcccaagg 


aacaggaaaa 


attcctcaat 


360 


ctctgggagc 


aaatcatgga 


tgccgtaggg 


caccacccca 


acgagtttct 


cgcttatgaa 


420 


atgctcaatg 


aggcggtcgc 


ggaagatgat 


gaagactgga 


acctgctcct 


caaccgtgcg 


480 


attgaacgca 


tccigggaacg 


tgagccgcat 


cgcgttctga 


ttgccggggc 


caactggtgg 


540 


cagcatgccg 


cccgcgttcc 


caacctgagg 


cttccccctg 


gtgatcccaa 


catcatcatc 


600 


agttttcact 


tttactcacc 


ctttctcttc 


acgcactatc 


gcagcagctg 


gactgccatg 


660 


cgggcatacc 


agggtttcgt 


ccaatacccc 


ggcattacca 


ttcccgccat 


ccatctcgaa 


720 


ggaatgaact 


atccggagtc 


ctttgtccaa 


atgtgggaag 


agcacaatca 


gtattacgac 


780 


atccattcaa 


tgtatgccga 


aatggtcccg 


gcggtgcgtt 


ttgccgaaaa 


gctgggcctt 


840 


cggctctatt 


gcggcgaatt 


tggagccatg 


aagaccgttg 


atcgtgccca 


aatgctgcag 


900 


tggtatcggg 


atgtggtcag 


agtctttgaa 


atgttggaca 


ttccctacac 


tgcctgggat 


960 


tatcagggaa 


cctttggaat 


ccgcgatgag 


ctgaccggtg 


agcctgatca 


tgaactgatc 


1020 


gacattctcc 


tcggccgcta 


a 








1041 
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<210> 132 
<211> 346 
<212> PRT 
<213> Unknown 

<:220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (14)... (325) 

<223> Cellulase (glycosyl hydrolase family 5) 
<220> 

<221> SITE 

<222> (12) . . . (15) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 132 

Met Glu Thr Tyr Phe Pro Leu His Arg Gly lie Asn Met Ser His Trp 
15 10 15 



Leu Ser Gin Val Asn Glu Asn He Pro Asp Arg Ser Thr Tyr Val Thr 
20 25 30 



Glu Arg Asp Leu Gin Phe Leu Arg Ala Ala Gly Phe Asp His Val Arg 
35 40 45 



Leu Pro He Asp Glu He Glu Leu Trp Asp Glu Glu Gly His Gin He 
50 55 60 



Glu Glu Ala Trp Gin Tyr Met His Asn Phe Met Arg Trp Ser Arg Lys 
65 70 75 80 



Asn Asp Leu Arg Val He Leu Asp Leu His Thr Val Leu Ser His His 
85 90 95 



Phe Asn Ala He Asn Met Gly Glu Val Asn Thr Leu Phe Asn Asp Pro 
100 105 110 



Lys Glu Gin Glu Lys Phe Leu Asn Leu Trp Glu Gin He Met Asp Ala 
115 120 125 



Val Gly His His Pro Asn Glu Phe Leu Ala Tyr Glu Met Leu Asn Glu 
130 135 140 



Ala Val Ala Glu Asp Asp Glu Asp Trp Asn Leu Leu Leu Asn Arg Ala 
145 150 155 160 



He Glu Arg He Arg Glu Arg Glu Pro His Arg Val Leu He Ala Gly 
165 170 175 
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Ala Asn Trp Trp Gin His Ala Ala Arg Val Pro Asn Leu Arg Leu Pro 
180 185 190 



Pro Gly Asp Pro Asn lie He He Ser Phe His Phe Tyr Ser Pro Phe 
195 200 205 



Leu Phe Thr His Tyr Arg Ser Ser Trp Thr Ala Met Arg Ala Tyr Gin 
210 215 220 



Gly Phe Val Gin Tyr Pro Gly He Thr He Pro Ala He His Leu Glu 
225 230 235 240 



Gly Met Asn Tyr Pro Glu Ser Phe Val Gin Met Trp Glu Glu His Asn 
245 250 255 



Gin Tyr Tyr Asp He His Ser Met Tyr Ala Glu Met Val Pro Ala Val 
260 265 270 



Arg Phe Ala Glu Lys Leu Gly Leu Arg Leu Tyr Cys Gly Glu Phe Gly 
275 280 285 



Ala Met Lys Thr Val Asp Arg Ala Gin Met Leu Gin Trp Tyr Arg Asp 
290 295 300 



Val Val Arg Val Phe Glu Met Leu Asp He Pro Tyr Thr Ala Trp Asp 
305 310 315 320 



Tyr Gin Gly Thr Phe Gly He Arg Asp Glu Leu Thr Gly Glu Pro Asp 
325 330 335 



His Glu Leu He Asp He Leu Leu Gly Arg 
340 345 



<210> 133 
<211> 1377 
<212> DNA 
<213> Unknovm 

<220> 

<223> Obtained from environmental sample 
<400> 133 

atgacacaac tggcttttcc atctaacttc atctggggaa cagctacttc cgcttaccaa 60 
atcgaaggcg cctggaacgc agacggcaag ggcgaatcta tttgggatcg cttttcccat 120 



acgcagggga agatcattga cggcagcaac ggcgatgtgg cctgcgatca ctaccaccgc 180 
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tggcgcgagg 


acgtggccct 


catgagagac ttgggtatgc 


aggcatatcg 




240 


tcctggccac 


gcatcctgcc 


caccggtcat ggacagatca 




y Vw- ^ y y l, L. 


3 00 


tacaatcgcc 


tggtggacgg 


gttgctggaa gctggcatca 


agccct ttgc 




7 fin 

J o u 


cactgggacc 


tgccgctggc 


gctacaggct gacggcggct 


ciQcccTaaacCT 
3 3 ^ ^ S3 » "-y ^ y 






aaggcctttg 


tcgaatacgc 


cgacgtggtc agccgcgcgc 




y y v-y dciy ciy 


^ o u 


tggatcaccc 


ataacgaacc 


gtggtgcatc agcatgctga 


gccatcaaat 


^yyyy^-y*--*^^ 


J 


gcgcccggct 


ggcgggactg 


gcaggctgcg ttggcggccg 


cgcaccacgt 


cctc!C't~t"t"prr 

w w \^ o w ^ \^ y 


600 


catggttggg 


ccgtgccgga 


actgcgtcgc aacagccgcg 


atcrcaoaaat 


*^yy^c*i**— dv^y 


D O U 


ttgaacttta 


ccccggcgga 


gccagcttcg aacagcgcag 


cccratttcaa 


rrcr p r" t" a t" p rrr* 


1 £t\J 


cagttcgatg 


gctacttcaa 


ccgctggttc ctggacccgc 




l^k>ClL'UClL.L<v*y 


7 fin 


gcagatatgg 


tgcacgatta 


catcgcgcaa ggctacctgc 


catpar'aoTTfT 






gtggaagctg 


gtgacctgga 


CCfCCratCCfCQ' B.CCICCIC1B.CCCT 


i_ i_ j_ . rrrxrt 


t-y v-gaac cat. 


Q nn 


tacacgcgcg 


aagtggtccg 


tacrccacicraa atcccacracra 


rt't* rT3 cfi^ ^ r^fir^ 


gccgcgcaca 


Q 

y bU 


gtcttgcgcg 


cgccacagga 


acracrtcTcraca craQat'cicfcrr't' 


yyyciciy uy ud 


tec tgagggc 


±UZ U 


ctctacaggt 


tgctcaatcg 


attocactti" crari t'ar»r«arrr' 


^-y v-y caay cu 


ctacgtgacc 


t o n 


gagagcggt t 


gcagctactc 


cgatggaccc ggccccaacg 


gtcggatacc 


ggaccaacgc 


1140 


cgtatcaact 


acctgcgcga 


tcacttcgca gcggcgcatc 


aggcgataca 


atgcggcgtc 


1200 


ccgctggccg 


gctacttcgt 


ctggtcgttc atggacaact 


tcgagtgggc 


caaagggtac 


1260 


acccaacgtt 


ttggtatcgt 


atgggtggat tatcaatcgc 


aacgacggat 


accgaaagac 


1320 


agcgcctact 


ggtatcgcga 


tgtcgtcgcc gccaacgcgg 


tgcaagttcc 


tgattag 


1377 



<210> 134 
<211> 458 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 

<222> (2) . . . (454) 

<223> Glycosyl hydrolase family 1 

<220> 

<221> SITE 

<222> (10) . . . (24) 

<223> Glycosyl hydrolases family 1 N- terminal signature, Prosite id = 
PS00653 

<400> 134 

Met Thr Gin Leu Ala Phe Pro Ser Asn Phe He Trp Gly Thr Ala Thr 
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5 10 15 



Ser Ala Tyr Gin lie Glu Gly Ala Trp Asn Ala Asp Gly Lys Gly Glu 
20 25 30 



Ser lie Trp Asp Arg Phe Ser His Thr Gin Gly Lys He He Asp Gly 
35 40 45 



Ser Asn Gly Asp Val Ala Cys Asp His Tyr His Arg Trp Arg Glu Asp 
50 55 60 



Val Ala Leu Met Arg Asp Leu Gly Met Gin Ala Tyr Arg Phe Ser He 
65 70 75 80 



Ser Trp Pro Arg He Leu Pro Thr Gly His Gly Gin He Asn Gin Ala 
85 90 95 



Gly Leu Asp Phe .Tyr Asn Arg Leu Val Asp Gly Leu Leu Glu Ala Gly 
100 105 110 



He Lys Pro Phe Ala Thr Leu Tyr His Trp Asp Leu Pro Leu Ala Leu 
115 120 125 



Gin Ala Asp Gly Gly Trp Pro Glu Arg Ser Thr Ala Lys Ala Phe Val 
130 135 140 



Glu Tyr Ala Asp Val Val Ser Arg Ala Leu Gly Asp Arg Val Lys Ser 
145 150 155 160 



Trp He Thr His Asn Glu Pro Trp Cys He Ser Met Leu Ser His Gin 
165 170 '175 



He Gly Glu His Ala Pro Gly Trp Arg Asp Trp Gin Ala Ala Leu Ala 
180 185 190 



Ala Ala His His Val Leu Leu Ser His Gly Trp Ala Val Pro Glu Leu 
195 200 205 



Arg Arg Asn Ser Arg Asp Ala Glu He Gly He Thr Leu Asn Phe Thr 
210 215 220 



Pro Ala Glu Pro Ala Ser Asn Ser Ala Ala Asp Phe Lys Ala Tyr Arg 
225 230 235 240 



Gin Phe Asp 'Gly Tyr Phe Asn Arg Trp Phe Leu Asp Pro Leu Tyr Gly 
245 250 255 
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Arg His Tyr Pro Ala Asp Met Val His Asp Tyr lie Ala Gin Gly Tyr 
260 265 270 



Leu Pro Ser Gin Gly Leu Thr Phe Val Glu Ala Gly Asp Leu Asp Ala 
275 280 285 



He Ala Thr Arg Thr Asp Phe Leu Gly Val Asn Tyr Tyr Thr Arg Glu 
290 295 300 



Val Val Arg Ser Gin Glu He Pro Glu Ser Glu Asn Ala Pro Arg Thr 
305 310 315 320 



Val Leu Arg Ala Pro Gin Glu Glu Trp Thr Glu Met Gly Trp Glu Val 
325 330 335 



Tyr Pro Glu Gly Leu Tyr Arg Leu Leu Asn Arg Leu His Phe Glu Tyr 
340 345 350 



Gin Pro Arg Lys Leu Ty^r Val Thr Glu Ser Gly Cys Ser Tyr Ser Asp 
355 360 365 



Gly Pro Gly Pro Asn Gly Arg He Pro Asp Gin Arg Arg He Asn Tyr 
370 375 380 



Leu Arg Asp His Phe Ala Ala Ala His Gin Ala He Gin Cys Gly Val 
385 390 395 400 



Pro Leu Ala Gly Tyr Phe Val Trp Ser Phe Met Asp Asn Phe Glu Trp 
405 410 415 



Ala Lys Gly Tyr Thr Gin Arg Phe Gly He Val Trp Val Asp Tyr Gin 
420 425 430 



Ser Gin Arg Arg He Pro Lys Asp Ser Ala Tyr Trp Tyr Arg Asp Val 
435 440 445 



Val Ala Ala Asn Ala Val Gin Val Pro Asp 
450 455 



<210> 135 
<211> 987 
<212> DNA 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
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<400> 135 



atggttgagc 


ctgccgatca 


gagtcatttt 


tcagatgctt 


ttcaggtaaa tcgcactctt 


60 


ggaaaaggca 


tcaatcttgg 


taacacactg 


gaggctccaa 


atgaaggcga gtggggattg 


120 


acaattcgcg 


aggagtattt 


tgatgaagtg 


aaacaagccg 


gatttgaatc cgtgcgtatt 


180 


ccgatacgat 


ggaatgctca 


tgctctggaa 


ggttttccat 


atacgataga tgaatctttt 


240 


tttgaccggg 


ttgatgaagt 


tattggctgg 


gcttttgatc 


gtgatcttgc agtcatgatt 


300 


aacattcatc 


actacaacga 


attgatggag 


cagccacagg atcaccggga tcgctttttg 


360 


aaactttggg 


agcaaattgc 


tgcgcactat 


aaagagtacc 


cggaagaact ggtattcgag 


420 


attttaaacg 


aaccccacga 


taatctgacc 


ccggctatct 


ggaatagctt tttggctgat 


480 


gctctcggta 


ttatacgcca 


aaccaatcca 


ggaagggtta 


ttgcagtcgg aacagctgaa 


540 


tggggcggtt 


tcgggagttt 


gcaggatctt 


gagctgcctg ataatgaccg ccagataatc 


600 


accaccgttc 


attactataa 


cccatttcat 


ttcacgcatc 


agggggcaga ttgggttgga 


660 


y augcia.y cy g 


atcagtggct 


tggaaccgaa 


tgggatggag 


cagatcatga aaaagctgaa 


720 


gttgacagcg 


attttgactc 


tgtggaacag 


tgggcccgaa atcatgaccg gccaatacac 


780 


gtgggagagt 


tcggagcttt 


cagcgccgca 


gatgatttgt 


cacgtgaaca gtggacggca 


840 


tacgtacgtg 


agtcttcgga 


gaaccggcag 


tttagctggg 


cgtattggga gtttgggtca 


900 


gggttcggtg 


cctatgatcc 


cggttccgga 


gaatggcgtg 


aatatttact ccgggcgtta 


960 


atccccgaca 


gtccggtgat 


tgattaa 






987 



<210> 136 
<211> 328 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (27) . . . (306) 

<223> Cellulase (glycosyl hydrolase family 5) 
<220> 

<221> SITE 

<222> (17) . . . (20) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (139) . . . (148) 

<223> Glycosyl hydrolases family 5 signature. Prosite id = PS00659 
<400> 136. 

Met Val Glu Pro Ala Asp Gin Ser His Phe Ser Asp Ala Phe Gin Val 
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15 



Asn Arg Thr Leu Gly Lys Gly lie Asn Leu Gly Asn Thr Leu Glu Ala 
20 25 30 



Pro Asn Glu Gly Glu Trp Gly Leu Thr lie Arg Glu Glu Tyr Phe Asp 
35 40 45 



Glu Val Lys Gin Ala Gly Phe Glu Ser Val Arg He Pro He Arg Trp 
50 55 60 



Asn Ala His Ala Leu Glu Gly Phe Pro Tyr Thr He Asp Glu Ser Phe 
65 70 75 80 



Phe Asp Arg Val Asp Glu Val He Gly Trp Ala Phe Asp Arg Asp Leu 
85 90 95 



Ala Val Met He Asn He His His Tyr Asn Glu Leu Met Glu Gin Pro 
100 105 110 



Gin Asp His Arg Asp Arg Phe Leu Lys Leu Trp Glu Gin He Ala Ala 
115 120 125 



His Tyr Lys Glu Tyr Pro Glu Glu Leu Val Phe Glu He Leu Asn Glu 
130 135 140 



Pro His Asp Asn Leu Thr Pro Ala He Trp Asn Ser Phe Leu Ala Asp 
145 150 155 160 



Ala Leu Gly He He Arg Gin Thr Asn Pro Gly Arg Val He Ala Val 
165 170 175 



Gly Thr Ala Glu Trp Gly Gly Phe Gly Ser Leu Gin Asp Leu Glu Leu 
180 185 190 



Pro Asp Asn Asp Arg Gin He He Thr Thr Val His Tyr Tyr Asn Pro 
195 200 205 



Phe His Phe Thr His Gin Gly Ala Asp Trp Val Gly Asp Glu Ala Asp 
210 215 220 



Gin Trp Leu Gly Thr Glu Trp Asp Gly Ala Asp His Glu Lys Ala Glu 
225 230 235 240 



Val Asp Ser Asp Phe Asp Ser Val Glu Gin Trp Ala Arg Asn His Asp 
245 250 255 
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Arg Pro lie His Val Gly Glu Phe Gly Ala Phe Ser Ala Ala Asp Asp 
260 265 270 



Leu Ser Arg Glu Gin Trp Thr Ala Tyr Val Arg Glu Ser Ser Glu Asn 
275 280 285 



Arg Gin Phe Ser Trp Ala Tyr Trp Glu Phe Gly Ser Gly Phe Gly Ala 
290 295 300 



Tyr Asp Pro Gly Ser Gly Glu Trp Arg Glu Tyr Leu Leu Arg Ala Leu 
305 310 315 320 



lie Pro Asp Ser Pro Val lie Asp 
325 



<210> 137 
<211> 702 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 137 



atgagccacc 


gatcgcagga 


attcaacggc 


cagccactga 


tggtgtccga 


agacggccac 


60 


ttcgtgctcg 


gattcgggcg 


cgacgacgag 


gccacccacc 


gactgcgcgt 


tcagctaccg 


120 


gatgagcgag 


tctgggagaa 


gaatctgcgt 


ccggaatcgc 


gcgagttcga 


tattcagcgg 


180 


atcgacggct 


tgccgcaaga 


ccaggtcacc 


ccaccccact 


ccgtgctggc 


gagaatccga 


240 


gaggacgctt 


cgctgtcgcg 


ccgtgcccgc 


gaacgacgcg 


atccgcggac 


cgactggacc 


300 


gatggctgga 


tctggccggc 


cgagggccgc 


atttccggcg 


tgtacggcag ccagcgcatc 


360 


ctcaacggtg 


agcctcgcaa 


cccgcactgg 


gggctggata 


tcgccgcgcc 


aaccggcagc 


420 


ccggtcgtgg 


cgcctgccgg 


cggcatcgtc 


agcctgactc 


atccggacat 


gtatttttcc 


480 


ggcggcaccc 


tgttaatcga 


ccacggtcac 


ggcctggtgt 


ctgcgttcct 


ccacctgagt 


540 


gaaatcctgg 


tcgaggaagg 


gcagcgggtc 


gagcaggggg 


atctgatcgc 


acgcattggc 


600 


gccaccggtc 


gtgccaccgg 


gccgcacctg 


gactggcgga 


tcaatctcgg cgatgtacgc 


660 


gtggacccac 


agctgctgct 


gccgccgatg 


gacgcgcagt 


ga 




702 



<210> 138 
<211> 233 
<212> PRT 
<213> Unknown 



<220> 
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<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 

<222> (127) . . . (223) 

<223> Peptidase family M23 

<400> 138 

Met Ser His Arg Ser Gin Glu Phe Asn Gly Gin Pro Leu Met Val Ser 
15 10 15 



Glu Asp Gly His Phe Val Leu Gly Phe Gly Arg Asp Asp Glu Ala Thr 
20 25 30 



His Arg Leu Arg Val Gin Leu Pro Asp Glu Arg Val Trp Glu Lys Asn 
35 40 45 



Leu Arg Pro Glu Ser Arg Glu Phe Asp He Gin Arg He Asp Gly Leu 
50 55 60 



Pro Gin Asp Gin Val Thr Pro Pro His Ser* Val Leu Ala Arg He Arg 
^5 70 75 80 



Glu Asp Ala Ser Leu Ser Arg Arg Ala Arg Glu Arg Arg Asp Pro Arg 
85 90 95 



Thr Asp Trp Thr Asp Gly Trp He Trp Pro Ala Glu Gly Arg He Ser 
100 105 110 

Gly Val Tyr Gly Ser Gin Arg He Leu Asn Gly Glu Pro Arg Asn Pro 
115 120 125 



His Trp Gly Leu Asp He Ala Ala Pro Thr Gly Ser Pro Val Val Ala 
130 135 140 



Pro Ala Gly Gly He Val Ser Leu Thr His Pro Asp Met Tyr Phe Ser 
145 150 155 160 



Gly Gly Thr Leu Leu He Asp His . Gly His Gly Leu Val Ser Ala Phe 
165 170 175 



Leu His Leu Ser Glu He Leu Val Glu Glu Gly Gin Arg Val Glu Gin 
180 185 190 



Gly Asp Leu He Ala Arg He Gly Ala Thr Gly Arg Ala Thr Gly Pro 
135 200 205 



His Leu Asp Trp Arg He Asn Leu Gly Asp Val Arg Val Asp Pro Gin 
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Leu Leu Leu Pro Pro Met Asp Ala Gin 
225 230 



<210> 139 
<211> 351 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 139 



atggaaaaaa 


ttctcgttat cggatgcgcg ggccagatag 


gctcagagct 


tacgctcgaa 


60 


cttcgtaaga 


tttatggtga tgacaatgtg gtggctactg 


acattaagcc 


ggccagcaag 


120 


gaaattaccg 


agggcggccc ctttgaaatt cttgatgtgc 


tcgacaccca 


ccggcttttt 


180 


ggcactgtaa 


gccgcaacaa gatcacccag atttatcacc 


ttgcagccat 


cctttcgggc 


240 


aatgccgaga 


aaaaaccact tgcaagctgg cacattaaca 


tggagagttt 


gctcaacgtg 


300 


cttgaactgg 


cccgtgaact gaagcttcat aaaattttct 


ggccaagctc 


a 


351 



<210> 140 
<211> 117 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<400> 140 

Met Glu Lys He Leu Val He Gly Cys Ala Gly Gin He Gly Ser Glu 
^ S 10 15 



Leu Thr Leu Glu Leu Arg Lys He Tyr Gly Asp Asp Asn Val Val Ala 
20 25 30 



Thr Asp He Lys Pro Ala Ser Lys Glu He Thr Glu Gly Gly Pro Phe 
35 40 45 



Glu He Leu Asp Val Leu Asp Thr His Arg Leu Phe Gly Thr Val Ser 
50 55 60 



Arg Asn Lys He Thr Gin He Tyr His Leu Ala Ala He Leu Ser Gly 
• 70 75 80 



Asn Ala Glu Lys Lys Pro Leu Ala Ser Trp His He Asn Met Glu Ser 
85 90 95 
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Leu Leu Asn Val Leu Glu Leu Ala Arg Glu Leu Lys Leu His Lys He 
100 105 110 

Phe Trp Pro Ser Ser 
115 

<210> 141 
<211> 1350 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 141 



atgctgtcct 


atacgagtcc 


gttcccaaag 


aactttgtct 


ggggtgtggc 


gacggcggcg 


60 


ccgcagatcg 


agggcgctgc 


gcgagaagac 


ggaaagggcg 


aatcgatatg 


ggatcgcttt 


120 


tgccgcgtgc 


ccggaaaggt 


ccacaatggc 


gatactctcg 


atgttgcgtg 


cgaccactac 


180 


caccggttcc 


gggaggattt 


cgcgctcatg 


cgagacttgg 


gcgtgcgcca 


ctaccggctt 


240 


tcgcttgcct 


ggccccgcat 


attcccggac 


ggcgacggcg 


cattgaacca 


gcgcggagtg 


300 


gatttctacc 


accggctctt 


tgaggccatg 


atcgagcacg 


ggattacgcc 


ttgggtgacg 


360 


ctctttcact 


gggatttgcc 


gcaggcgctc 


gaggaccgcg 


gcggctggtg 


tgagcgtctc 


420 


accgtcgatg 


cattcgggcg 


ctacgctgac 


accgtggtga 


aggcgtttgg 


cgatcgcgtg 


480 


aagaattgga 


tcaccctgaa 


cgaaatccgc 


tgcttcacgt 


tgctcgctta 


cgatctctgc 


540 


atcaaggccc 


cgggccgcaa 


ggtctcgcgg 


gcgcagctca 


accagaccta 


tcatcacgcg 


600 


ctgatctgcc 


atgggcatgg 


cgtccgggcg 


gtccgcgaac 


acggcgggcg 


aggcgctcgc 


660 


gtcgggctta 


ccgacaacag 


cgacgtatgc 


gtgcccgtca 


ccgagaccgc 


gcccgacatc 


720 


attgcggcca 


gatcctggta 


tgcgtcgcga 


aatattcatc 


tgctcgatcc 


gatctatcgc 


780 


ggcgagtatg 


cgccggaata 


cctcgaacgc 


tgcggtgcgg 


acgcgcccca 


ggtggccgag 


840 


gacgatttcg 


cgctgatttc 


aatgccgacg 


gattttctcg 


ggctgaatgt 


atatacggcg 


900 


acctttgtgc 


gtgccgacgc 


ggagggcagg 


ccggaggaga 


ttaaactgcc 


gcggaattac 


960 


ccgcgcgcgg 


atagcgcgtg 


gttgaatatt 


gtgccccagt 


cgatgtactg 


ggccacacgg 


1020 


ctggcgcggg 


aaacctacgg 


cgtgagatca 


atctacatca 


ccgaaaacgg 


ctgcggctac 


1080 


gacgacgagc 


ccgtcgacgg 


cggcgaggtg 


Gtcgacctgc 


atcgacgcga 


ttttctgcgc 


1140 


aaccaccttc 


gggaattgca 


tcgcgccata 


ggcgacggcg 


tgcccgttga 


cgggtatttt 


1200 


ctctggtcct 


tcatggacaa 


ctacgagtgg 


gaggacgggt 


atgcgcggcg 


gttcggcatc 


1260 


gttcacgtcg 


acttcgaaag 


ccagaaacgg 


actccaaaac 


tctcggcgcg 


ctattacgcg 


1320 
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caggtaatga aagaaaaccg gatcctgtga 1350 

<210> 142 
<211> 449 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMA.IN 
<222> (4) . . . (448) 

<223> Glycosyl hydrolase family 1 
<400> 142 

Met Leu Ser Tyr Thr Ser Pro Phe Pro Lys Asn Phe Val Trp Gly Val 
1 5 10 15 

Ala Thr Ala Ala Pro Gin He Glu Gly Ala Ala Arg Glu Asp Gly Lys 
20 25 30 

Gly Glu Ser He Trp Asp Arg Phe Cys Arg Val Pro Gly Lys Val His 
35 40 45 

Asn Gly Asp Thr Leu Asp Val Ala Cys Asp His Tyr His Arg Phe Arg 
50 55 60 

Glu Asp Phe Ala Leu Met Arg Asp Leu Gly Val Arg His Tyr Arg Leu 
^5 70 75 80 

Ser Leu Ala Trp Pro Arg He Phe Pro Asp Gly Asp Gly Ala Leu Asn 
85 90 95 

Gin Arg Gly Val Asp Phe Tyr His Arg Leu Phe Glu Ala Met He Glu 
100 105 110 

His Gly He Thr Pro Trp Val Thr Leu Phe His Trp Asp Leu Pro Gin 
115 120 125 

Ala Leu Glu Asp Arg Gly Gly Trp Cys Glu Arg Leu Thr Val Asp Ala 
130 135 140 

Phe Gly Arg Tyr Ala Asp Thr Val Val Lys Ala Phe Gly Asp Arg Val 
1^5 150 155 160 

Lys Asn Trp He Thr Leu Asn Glu He Arg Cys Phe Thr Leu Leu Ala 
165 170 175 



Tyr Asp Leu Cys He Lys Ala Pro Gly Arg Lys Val Ser Arg Ala Gin 
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180 185 190 



Leu Asn Gin Thr Tyr His His Ala Leu lie Cys His Gly His Gly Val 
195 200 205 



Arg Ala Val Arg Glu His Gly Gly Arg Gly Ala Arg Val Gly Leu Thr 
210 215 220 



Asp Asn Ser Asp Val Cys Val Pro Val Thr Glu Thr Ala Pro Asp lie 
225 230 235 240 



He Ala Ala Arg Ser Trp Tyr Ala Ser Arg Asn He His Leu Leu Asp 
245 250 255 



Pro He Tyr Arg Gly Glu Tyr Ala Pro Glu Tyr Leu Glu Arg Cys Gly 
260 265 270 



Ala Asp Ala Pro Gin Val Ala Glu Asp Asp Phe Ala Leu He Ser Met 
275 280 285 



Pro Thr Asp Phe Leu Gly Leu Asn Val Tyr Thr ' Ala Thr Phe Val Arg 
290 295 300 



Ala Asp Ala Glu Gly Arg Pro Glu Glu He Lys Leu Pro Arg Asn Tyr 
305 310 315 320 



Pro Arg Ala Asp Ser Ala Trp Leu Asn He Val Pro Gin Ser Met Tyr 
325 330 335 



Trp Ala Thr Arg Leu Ala Arg Glu Thr Tyr Gly Val Arg Ser He Tyr 
340 345 350 



He Thr Glu Asn Gly Cys Gly Tyr Asp Asp Glu Pro Val Asp Gly Gly 
355 360 365 



Glu Val Leu Asp Leu His Arg Arg Asp Phe Leu Arg Asn His Leu Arg 
370 375 380 



Glu Leu His Arg Ala He Gly Asp Gly Val Pro Val Asp Gly Tyr Phe 
385 390 395 400 



Leu Trp Ser Phe Met Asp Asn Tyr Glu Trp Glu Asp Gly Tyr Ala Arg 
405 410 415 



Arg Phe Gly He Val His Val Asp Phe Glu Ser Gin Lys Arg Thr Pro 
420 425 430 
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Lys Leu Ser Ala Arg Tyr Tyr Ala Gin Val Met Lys Glu Asn Arg He 
435 440 445 

Leu 

<210> 143 
<211> 1188 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 143 

atgaccatca ccttccccga cgggttctgg tgggggacgg cgacggccgc ccaccaggtg 60 

gagggcggca actggaacac cgactggtgg gcctacgagc acgccccggg cacccgctgc 120 

gcggagccgt ccggcgatgc gtgcgaccac tggcaccgct acccggagga catcgccctc 180 

ctcgccgcgc tcgggttcag tgcctaccgc ttctcggtgg aatgggctcg catcgagccc 240 

gaggaagggc atttctcccg cgccaccctc gaccactacc ggcgcatgat cgcctgctgc 3 00 

cgcgaccacg ggctggcccc ggtggtgacc ttccaccact tcaccacccc ccgctgggcc 3 60 

gcggccgggg gctgctggtc cgacccggtc accgccgagc gcttcgcccg ttactgcgag 420 

cgcaccgtgg ccgccctcgg cgacgagatc gcgatggcct gcacgatcaa cgagccgaac 480 

atcgtggcca ccctcgggta cttcctcggc gagttcccgc cggccgtcgc cgaccccgac 540 

cgctaccggc aggcgaacga cacgctgatc cgcgcccatc gcctcgccta cgaggcgctg 600 

aaggccgggc ccggcgagtt ccccgtcggc ctcaccctgt cgatggccga gttcgtcgcc 660 

gagcccggcg gcgaggccca cctcgcccag gtccggcaca cgatggagga catcttcctg 720 

gaggccgccc ggggcgacga cttcatcggg gtgcagacct acagccgcat gcgcttcggt 780 

cccgactcgc cgatcccgct cgggccggcc gagggcgtcg aggt.cgtcca gatggggtac 840 

gagtactggc cgtgggcgct cgaggcgacg atccggcgcg ccgccgaggt caccggcacg 900 

gcggtccacg tcaccgagaa cggcatcggg accgccgacg acacgcagcg ggtcgcctac 960 

gtcaccgagg ccctccgggg gctgcggcgc tgcctcgacg acggcatcga cgtccgcagc .1020 

tacttctact ggacgctgct cgacaacttc gagtggacgc gcggctacgt gccgacgttc 1080 

gggctcgtcg ccgtcgaccg caccacccag cgccggtcgg tgaagccgag cgcggtgtgg 1140 

ctcggcgagg tcgcccgcac gaaccgcctc gagctcccgg accgctga 1188 

<210> 144 
<211> 395 
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<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (1) . . . (390) 

<223> Glycosyl hydrolase family 1 
<220> 

<221> SITE 
<222> (9) . . . (23) 

<223> Glycosyl hydrolases family 1 N- terminal signature, Prosite id = 
PS00653 

<220> 

<221> SITE 

<222> (188) . . . (191) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 144 

Met Thr He Thr Phe Pro Asp Gly Phe Trp Trp Gly Thr Ala Thr Ala 
15 10 15 

Ala His Gin Val Glu Gly Gly Asn Trp Asn Thr Asp Trp Trp Ala Tyr 
20 25 30 

Glu His Ala Pro Gly Thr Arg Cys Ala Glu Pro Ser Gly Asp Ala Cys 
35 40 45 

Asp His Trp His Arg Tyr Pro Glu Asp He Ala Leu Leu Ala Ala Leu 
50 55 60 

Gly Phe Ser Ala Tyr Arg Phe Ser Val Glu Trp Ala Arg He Glu Pro 
65 70 75 80 

Glu Glu Gly His Phe Ser Arg Ala Thr Leu Asp His Tyr Arg Arg Met 
85 90 95 

He Ala Cys Cys Arg Asp His Gly Leu Ala Pro Val Val Thr Phe His 
100 105 110 

His Phe Thr Thr Pro Arg Trp Ala Ala Ala Gly Gly Cys Trp Ser Asp 
115 120 125 

Pro Val Thr Ala Glu Arg Phe Ala Arg Tyr Cys Glu Arg Thr Val Ala 
130 135 140 



Ala Leu Gly Asp Glu He Ala Met Ala Cys Thr He Asn Glu Pro Asn 
1^5 150 155 160 
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He Val Ala Thr Leu Gly Tyr Phe Leu Gly Glu Phe Pro Pro Ala Val 
165 170 175 



Ala Asp Pro Asp Arg Tyr Arg Gin Ala Asn Asp Thr Leu He Arg Ala 
180 185 190 



His' Arg Leu Ala Tyr Glu Ala Leu Lys Ala Gly Pro Gly Glu Phe Pro 
195 200 205 



Val Gly Leu Thr Leu Ser Met Ala Glu Phe Val Ala Glu Pro Gly Gly 
210 215 220 



Glu Ala His Leu Ala Gin Val Arg His Thr Met Glu Asp He Phe Leu 
225 230 235 240 



Glu Ala Ala Arg Gly Asp Asp Phe He Gly Val Gin Thr Tyr Ser Arg 
245 250 255 



Met Arg Phe Gly Pro Asp Ser Pro He Pro Leu Gly Pro Ala Glu Gly 
260 265 270 



Val Glu Val Val Gin Met Gly Tyr Glu Tyr Trp Pro Trp Ala Leu Glu 
275 280 285 



Ala Thr He Arg Arg Ala Ala Glu Val Thr Gly Thr Ala Val His Val 
290 295 300 



Thr Glu Asn Gly He Gly Thr Ala Asp Asp Thr Gin Arg Val Ala Tyr 
305 310 315 320 



Val Thr Glu Ala Leu Arg Gly Leu Arg Arg Cys Leu Asp Asp Gly He 
325 330 335 



Asp Val Arg Ser Tyr Phe Tyr Trp Thr Leu Leu Asp Asn Phe Glu Trp 
340 345 350 



Thr Arg Gly Tyr Val Pro Thr Phe Gly Leu Val Ala Val Asp Arg Thr 
355 360 365 



Thr Gin Arg Arg Ser Val Lys Pro Ser Ala Val Trp Leu Gly Glu Val 
370 375 380 



Ala Arg Thr Asn Arg Leu Glu Leu Pro Asp Arg 
385 390 395 
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<210> 145 
<211> 1386 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 145 

atgtcgtttc cgagaaattt cctgtgggga tcagccacct cctcctacca aatcgaaggc 60 

gcctggcaag aagacggcaa aggcccaaat atctgggacg tgttttcaca caccccgggg 120 

aaagtcgcca atggcgacac cggtgatatc gccatcgacc actaccaccg ataccgagac 180 

gacgttgccc tgatggctga gcttggactt caggcatacc gtttctcgtt ctcctgggcc 240 

agaataatgc cggaaggagc aggccccatc gagcaacggg gtctggactt ctacgaccgc 300 

ctcattgatg cactgctgga gaaaaacatc caacccatgg ccaccctcta ccactgggat 360 

ttaccagccg cactgcaaga cagagggggg tggactaacc gcgacagcgc gtcctggttt 420 

gctgactact cagccgttgt tcacgacgct ttttctgacc gggtgggaat gtgggcaacg 480 

ttgaacgagc cgtgggtgtc tgcatttttg ggccacggaa ctggcatcca cgcacctggc 540 

atcacaagcc cccacgcggc gttcgccgcg gggcatcacc tgcttctggg gcatggcaag 600 

gccatccaag cgatgcgcgc tcaatcgtct agcacccaac tgggaattgt tttgaacctc 660 

gcccGcgtgt atctcgaagg tgacacccct gctgaccacc cggctcacac ctccgtggca 720 

ctacacgatg, ccattttgaa tgggttgtgg acagagccgc ttctgcgctc cagatacccc 780 

gacctgcttc ttcaactagg cgacatggtg acaaaaaaca tccacgacgg tgacctcgcc 840 

atcatggccg agccgattga ctggatgggc atcaactact accaggacat tagatttgtg 900 

gccactgatg ttgcccccac ggctaacccg atggcccctc cgggtaacga cctgccgggc 960 

accgtcgggg tggagcctgc gccagcaatc ggaaacatca ccagctttgg ctggtccacc 1020 

acccccgacg gactgcgagt actgttggtg ggcctggatg aggaatacga caacctcccg 1080 

ccgatattca ttaccgaaaa cgggtgtgct tacgattacc ccgtcgagga cggtgtcgtc 1140 

aacgacaccc ttcgtgtcac atacatgcga gaacacctca ccgcgttgtc gcaggccatt 12 00 

gaggcgggtg tgaatgtccg gggctatatg cactggtctc tgttcgacaa cttcgagtgg 1260 

gccgaagggt atcgccaacg ctttggcatg gtgcacgtcg actttgagac cttggagcgg 1320 

actcccaaag cctcagctca ctactattca cgtgtcatca caaataacgc cctctctgac 1380 

gactga ^^^^ 

<210> 146 
<211> 461 
<212> PRT 
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<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (1) . . . (458) 

<223> Glycosyl hydrolase family 1 
<220> 

<221> SITE 
<222> (7) , . . (21) 

<223> Glycosyl hydrolases family 1 N- terminal signature. Prosite id 
PS00653 

<220> 

<221> SITE 

<222> (337) , . . (340) 

<223> N-glycosylation site, Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (386) . . . (389) 

<223> N-glycosylation site. Prosite id - PSOOOOl 
<400> 146 

Met Ser Phe Pro Arg Asn Phe Leu Trp Gly Ser Ala Thr Ser Ser Tyr 
^5 10 15 

Gin He Glu Gly Ala Trp Gin Glu Asp Gly Lys Gly Pro Asn He Trp 
20 25 30 

Asp Val Phe Ser His Thr Pro Gly Lys Val Ala Asn Gly Asp Thr Gly 
35 40 45 

Asp He Ala He Asp His Tyr His Arg Tyr Arg Asp Asp Val Ala Leu 
50 55 60 

Met Ala Glu Leu Gly Leu Gin Ala Tyr Arg Phe Ser Phe Ser Trp Ala 
65 70 75 80 

Arg He Met Pro Glu Gly Ala Gly Pro He Glu Gin Arg Gly Leu Asp 
85 90 95 

Phe Tyr Asp Arg Leu He Asp Ala Leu Leu Glu Lys Asn He Gin Pro 
100 105 110 

Met Ala Thr Leu Tyr His Trp Asp Leu Pro Ala Ala Leu Gin Asp Arg 
115 120 125 

Gly Gly Trp Thr Asn Arg Asp Ser Ala Ser Trp Phe Ala Asp Tyr Ser 
130 135 140 
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Ala Val Val His Asp Ala Phe Ser Asp Arg Val Gly Met Trp Ala Thr 
145 150 155 160 



Leu Asn Glu Pro Trp Val Ser Ala Phe Leu Gly His Gly Thr Gly lie 
165 170 175 



His Ala Pro Gly lie Thr Ser Pro His Ala Ala Phe Ala Ala Gly His 
180 185 190 



His Leu Leu Leu Gly His Gly Lys Ala lie Gin Ala Met Arg Ala Gin 
195 200 205 



Ser Ser Ser Thr Gin Leu Gly lie Val Leu Asn Leu Ala Pro Val Tyr 
210 215 220 



Leu Glu Gly Asp Thr Pro Ala Asp His Pro Ala His Thr Ser Val Ala 
225 230 235 240 



Leu His Asp Ala lie Leu Asn Gly Leu Trp Thr Glu Pro Leu Leu Arg 
245 250 255 



Ser Arg Tyr Pro Asp Leu Leu Leu Gin Leu Gly Asp Met Val Thr Lys 
260 265 270 



Asn lie His Asp Gly Asp Leu Ala He Met Ala Glu Pro He Asp Trp 
275 280 285 



Met Gly He Asn Tyr Tyr Gin Asp He Arg Phe Val Ala Thr Asp Val 
290 295 300 



Ala Pro Thr Ala Asn Pro Met Ala Pro Pro Gly Asn Asp Leu Pro Gly 
305 310 315 320 



Thr Val Gly Val Glu Pro Ala Pro Ala He Gly Asn He Thr Ser Phe 
325 330 335 



Gly Trp Ser Thr Thr Pro Asp Gly Leu Arg Val Leu Leu Val Gly Leu 
340 345 350 



Asp Glu Glu Tyr Asp Asn Leu Pro Pro He Phe He Thr Glu Asn Gly 
355 360 365 



Cys Ala Tyr Asp Tyr Pro Val Glu Asp Gly Val Val Asn Asp Thr Leu 
370 375 380 
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Arg Val Thr Tyr Met Arg Glu His Leu Thr Ala Leu Ser Gin Ala He 
385 390 395 400 

Glu Ala Gly Val Asn Val Arg Gly Tyr Met His Trp Ser Leu Phe Asp 
405 410 415 

Asn Phe Glu Trp Ala Glu Gly Tyr Arg Gin Arg Phe Gly Met Val His 
420 425 430 

Val Asp Phe Glu Thr Leu Glu Arg Thr Pro Lys Ala Ser Ala His Tyr 
435 440 445 

Tyr Ser Arg Val He Thr Asn Asn Ala Leu Ser Asp Asp 



<210> 147 
<211> 1242 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 147 

atgctaaaag ttttacgtaa acctattatt tctggattag ctttagctct attattgccg 60 

gcaggggcag ctggtgccga aactaatatt tcaaagaagc caaatataag tggattaacc 120 

gcgccgcaat tagaccaaag atataaagat tctttcacca ttggtgctgc ggttgagccg 180 

tatcaattat tagatgcaaa agattcacaa atgctaaagc ggcattttaa tagtatcgta 240 

gcagagaatg tcatgaagcc tagtagttta cagccagtag aaggacaatt caactgggag 300 

ccggctgata aacttgttca gtttgcgaag gaaaatggaa tggacatgcg aggtcatacg 360 

cttgtctggc atagccaggt accggattgg ttctttgaag atgcggcagg aaatccaatg 420 

gttgtttggg aaaatggcag gcaagtggtt gccgatccat caaagcttca ggaaaacaaa 480 

gagctcttac ttagccgatt acaaaatcat attcaggcag tcgtaacgcg ttataaagat 540 

gatataaaat cttgggatgt tgtcaatgaa gtaatcgatg aatggggcgg acattctgaa 600 

gggctgcgtc aatctccatg gttcctcatc accggaacgg actatattaa agttgctttt 660 

gaaactgcaa gagaatatgc agctccagac gctaagctgt atatcaatga ttacaataca 720 

gaagtagaac caaaaaggac gcacctttah aacttagtaa aaagtttaaa agaagaacag 780 

aacgttccga ttgatggtgt tgggcatcag tctcacattc aaattggctg gccttcagaa 840 

aaagaaattg aagatactat taatatgttt gcagatcttg gtttagataa ccaaatcacc 900 

gagcttgatg ttagtatgta tggctggccg gtaaggtcgt atccaactta tgatgcgatc 960 
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ccagaactta aattcatgga tcaagcagct cgttatgatc gtttatttaa gttatatgag 1020 

aaattaggag ataaaatcag taatgtgaca ttctggggta ttgcggataa ccatacatgg 1080 

ctgaatgacc gcgcagatgt ttactatgat gaaaatggaa atgttgtatt agatagagaa 1140 

acaccaagag tagaaagagg agcaggaaaa gatgcgccat ttgtatttga tcctgaatac 12 00 

aatgtaaaac cagcttattg ggcaattatc gatcacaaat aa 1242 

<210> 148 
<211> 413 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAL 
<222> {!)... (26) 

<220> 

<221> DOMAIN 
<222> (43) . . , (413) 

<223> Glycosyl hydrolase family 10 
<220> 

<221> SITE 

<222> (29),.. (32) 

<223> W-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (35) ... (38) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (298) . . . (308) 

<223> Glycosyl hydrolases family 10 active site. Prosite id = PS00591 
<220> 

<221> SITE 

<222> (353) . . . (356) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (362) . . . (365) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 148 

Met Leu Lys Val Leu Arg Lys Pro He He Ser Gly Leu Ala Leu Ala 
1 5 10 



15 



Leu Leu Leu Pro Ala Gly Ala Ala Gly Ala Glu Thr Asn He Ser Lys 
20 25 30 



wo 2006/101584 



218/248 



PCT/US2006/002516 



liys Pro Asn lie Ser Gly Leu Thr Ala Pro Gin Leu Asp Gin Arg Tyr 
35 40 45 



Lys Asp Ser Phe Thr lie Gly Ala Ala Val Glu Pro Tyr Gin Leu Leu 
50 55 60 



Asp Ala Lys Asp Ser Gin Met Leu Lys Arg His Phe Asn Ser lie Val 

70 75 80 

Ala Glu Asn Val Met Lys Pro Ser Ser Leu Gin Pro Val Glu Gly Gin 
85 90 95 

Phe Asn Trp Glu Pro Ala Asp Lys Leu Val Gin Phe Ala Lys Glu Asn 
100 105 110 

Gly Met Asp Met Arg Gly His Thr Leu Val Trp His Ser Gin Val Pro 
115 120 125 



Asp Trp Phe Phe Glu Asp Ala Ala Gly Asn Pro Met Val Val Trp Glu 
130 135 140 



Asn Gly Arg Gin Val Val Ala Asp Pro Ser Lys Leu Gin Glu Asn Lys 

150 155 160 

Glu Leu Leu Leu Ser Arg Leu Gin Asn His He Gin Ala Val Val Thr 
165 170 175 

Arg Tyr Lys Asp Asp He Lys Ser, Trp Asp Val Val Asn Glu Val He 
180 185 190 

Asp Glu Trp Gly Gly His Ser Glu Gly Leu Arg Gin Ser Pro Trp Phe 
195 200 205 

Leu He Thr Gly Thr Asp Tyr He Lys Val Ala Phe Glu Thr Ala Arg 
210 215 220 



Glu Tyr Ala Ala Pro Asp Ala Lys Leu Tyr He Asn Asp Tyr Asn Thr 
225 230 235 240 

Glu Val Glu Pro Lys Arg Thr His Leu Tyr Asn Leu Val Lys Ser Leu 
245 250 255 

Lys Glu Glu Gin Asn Val Pro He Asp Gly Val Gly His Gin Ser His 
260 265 270 
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lie Gin lie Gly Trp Pro Ser Glu Lys Glu He Glu Asp Thr He Asn 
275 280 285 



Met Phe Ala Asp Leu Gly Leu Asp Asn Gin He Thr Glu Leu Asp Val 
290 295 300 

Ser Met Tyr Gly Trp Pro Val Arg Ser Tyr Pro Thr Tyr Asp Ala He 
305 310 315 320 

Pro Glu Leu Lys Phe Met Asp Gin Ala Ala Arg Tyr Asp Arg Leu Phe 
325 330 335 

Lys Leu Tyr Glu Lys Leu Gly Asp Lys He Ser Asn Val Thr Phe Trp 
340 345 350 

Gly He Ala Asp Asn His Thr Trp Leu Asn Asp Arg Ala Asp Val Tyr 
355 360 365 



Tyr Asp Glu Asn Gly Asn Val Val Leu Asp Arg Glu Thr Pro Arg Val 
370 _ 375 380 



Glu Arg Gly Ala Gly Lys Asp Ala Pro Phe Val Phe Asp Pro Glu Tyr 
385 390 395 400 

Asn Val Lys Pro Ala Tyr Trp Ala He He Asp His Lys 
405 410 



<210> 149 
<211> 1068 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 149 



atgacccgaa 


tgcgcgggat 


aaacatgggc 


ggctggctca 


gccaaattga 


cgccatacag 


60 


gaaaaagacG 


ctgatacatt 


tcccggaaca 


gacaaacata 


tggaaacttt 


tatccagcag 


120 


aaggattttg 


ccaatgtcag 


gagatggggt 


ttcgatcatg 


tgcgaattcc 


aattgacgcg 


180 


tatctgttct 


ttaccgaaaa 


aggagagccg 


attgaaaaca 


ggcttgccaa 


tcttgaccgc 


240 


gccgtagagt 


atgcgctgcc 


cgccggcctc 


aacatgatat 


tggacctcca 


cgagtgtccg 


300 


gggcacgatt 


tttcggaagc 


agtaaaaagc 


cctgtccaaa 


aacttttctc 


gggagatgac 


360 


acctggataa 


ggaaaactga 


aaaaatatgg 


gcttgccttg 


ccgagcgtta 


ttctcaaaag 


420 


ggccacgtcc 


tttttgagac 


gctcaatgag 


cctgtcgctc 


ccaccgcgga 


gatttggaac 


480 
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aatgttaagg 


acaggctctg 


ccgcgaaata 


cggctccacg 


ccccctggtc gactataatc 


540 


accggctcca 


acatgtggaa 


ctcagcggca 


accttcgaca 


gcctcacgcc ctttgacgac 


600 


gacaacatga 


tctacagcgt 


acatttttac 


gagccgctgc 


ttttcacgca ccagaacgca 


u o vj 


ttgtggatcg 


acaatccgga 


aatcaggatc 


gcaaggccgt 


a t C P crcrrrr' (Tr^ hharrrfTt-r^r'r' 
M v^y y y v^y a, u i^ci^y y 




ggctttgtcc 


ccaaagacgg 


tttgacgctg 


tcggacggcg 


tctggaacag ggatcgtctc 


780 


crc G crcrc cr c a t* 


ucxy k^y t^^^^^y v. 


gaacgcgttc 


aggaaaaagt 


acaatgcgaa gattatctgt 


840 


aacgagttcg 


gcgtttacgc 


gcccgtagac 


cttcaatcgc 


agctgcgctg gtatgaagat 


900 


ctgctctcaa 


tcctcaatga 


gacggggatc 


ggtttcacgt 


actggaacta taaaaatctc 


960 


gacttcggga 


taatttccat 


aggggagaag 


ctgcacgaag cccttccgca gtacgacaat 


1020 


agcgatcgaa 


taaataaatc 


ggttcttgaa 


gtgttaaaaa 


agtattag 


1068 



<210> 150 
<211> 355 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (24) . . . (325) 

<223> Cellulase (glycosyl hydrolase family 5) 
<220> 

<221> SITE 

<222> (145) . . . (154) 

<223> Glycosyl hydrolases family 5 signature. Prosite id = PS00659 
<220> 

<221> SITE 

<222> (310) . . . (313) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<220> 

<221> SITE 

<222> (350) . . . (353) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 150 

Met Thr Arg Met Arg Gly lie Asn Met Gly Gly Trp Leu Ser Gin lie 
^5 10 15 



Asp Ala lie Gin Glu Lys Asp Pro Asp Thr Phe Pro Gly Thr Asp Lys 
20 25 



30 



His Met Glu Thr Phe lie Gin Gin Lys Asp Phe Ala Asn Val Arg Arg 
35 • 40 45 
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Trp Gly Phe Asp His Val Arg lie Pro He Asp Ala Tyr Leu Phe Phe 
50 55 60 

Thr Glu Lys Gly Glu Pro He Glu Asn Arg Leu Ala Asn Leu Asp Arg 
65 70 75 80 

Ala Val Glu Tyr Ala Leu Pro Ala Gly Leu Asn Met He Leu Asp Leu 
85 90 95 

His Glu Cys Pro Gly His Asp Phe Ser Glu Ala Val Lys Ser Pro Val 
100 105 110 

Gin Lys Leu Phe Ser Gly Asp Asp Thr Trp He Arg Lys Thr Glu Lys 
115 120 125 

He Trp Ala Cys Leu Ala Glu Arg Tyr Ser Gin Lys Gly His Val Leu 
130 135 140 

Phe Glu Thr Leu Asn Glu Pro Val Ala Pro Thr Ala Glu He Trp Asn 

150 155 160 

Asn Val Lys Asp Arg Leu Cys Arg Glu He Arg Leu His Ala Pro Trp 
165 170 175 

Ser Thr He He Thr Gly Ser Asn Met Trp Asn Ser Ala Ala Thr Phe 
180 185 190 

Asp Ser Leu Thr Pro Phe Asp Asp Asp Asn Met He Tyr Ser Val His 
195 200 205 

Phe Tyr Glu Pro Leu Leu Phe Thr His Gin Asn Ala Leu Trp He Asp 
210 215 220 

Asn Pro Glu He Arg He Ala Arg Pro Tyr Pro Gly Asp Tyr Gly Pro 
225 230 235 240 

Gly Phe Val Pro Lys Asp Gly Leu Thr Leu Ser Asp Gly Val Trp Asn 
245 250 255 

Arg Asp Arg Leu Ala Gly Ala Leu Ala Pro Val Asn Ala Phe Arg Lys 
260 265 270 

Lys Tyr Asn Ala Lys He He Cys Asn Glu Phe Gly Val Tyr Ala Pro 
275 280 285 



Val Asp Leu Gin Ser Gin Leu Arg Trp Tyr Glu Asp Leu Leu Ser He 
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Leu Asn Glu Thr Gly lie Gly Phe Thr Tyr Trp Asn Tyr Lys Asn Leu 
305 310 315 320 

Asp Phe Gly He He Ser He Gly Glu Lys Leu His Glu Ala Leu Pro 
325 330 335 

Gin Tyr Asp Asn Ser Asp Arg He Asn Lys Ser Val Leu Glu Val Leu 



Lys Lys Tyr 
355 

<210> 151 

<211> 1068 

<212> DNA 

<213> Unknown 

<220> 

<223> Obtained frdin environmental sample 
<400> 151 

atgaccagaa tgcgcggaat aaacatgggc ggctggctca gccagattga cgccatabag 60 

gaaaaagacc ccgataaatt tcccggaata gacaaacaca tggaaacatt tatcggttcc 120 

aatgattttt ccaatgtcag gaaatggggt ttcgatcatg tgcgaatccc gattgacgcg 180 

tacctttttt ttaccgatca ggaagccccg attgaaaaca ggcttgtcca tattgacaac 240 

gccgtaaaat acgcgcggag caacggcctc aaggtgatat tggacctcca cgagtgtccg 300 

gggcatgatt tttcggacgc ggcaaaaggc cctgtccaga aacttttctc cggagatgac 360 

acttatataa aaaagaccga aaaaatatgg gcatgtctgg ccgagcgtta ttcgaaaaac 420 

gacaacgtcc tctatgagac tctcaacgag cctgtcgccc ccacgcctga gatttggaac 480 

actgttaagg acaggctctg ccgggaaata cgcctgcacg ccccctgggc gacgataatc 540 

accggttcca atatgtggaa ttggccgagc acctttgaca gcctgacgcc ctttgacgac 600 

gacaacgtga tctacagcgt gcatttttac gagccgctgc tttttacgca ccagaacgcg 660 

ccctggatca acaattctga aatcaggatc acaaggccgt atccgggcga ttacggcccc 720 

ggctttgtcc gcaaatacgg cttaactctg tcagccggcg tctggaacag ggacaggctg 780 

gcgaaggaat tcgcgcccgt gaacgcgttc aggaaaaaat acaaggcgca ggttatatgc 840 

gacgaattcg gcgtttacgc gcctgtcgag attgaatcgc agcttcgatg gtatgaggat 900 

ttgctctcga tcctcaggga gatgggtata gggttttcgt actggaacta taaaaacctg 960 

gactttggga taatttccat aggggagaag ctgcacgaaa gccttctgca gtacggcaac 1020 
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ggcgacagga taaatcatat ggttcttgac ttgctaaaga agtactaa 1068 

<210> 152 
<211> 355 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (19) . . . (325) 

<223> Cellulase (glycosyl hydrolase family 5) 
<220> 

<221> SITE 

<222> (145) . . . (154) 

<223> Glycosyl hydrolases family 5 signature. Prosite id = PS00659 
<220> 

<221> SITE 

<222> (227) . . . (230) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 152 

Met Thr Arg Met Arg Gly lie Asn Met Gly Gly Trp Leu Ser Gin lie 
^5 10 15 

Asp Ala He Gin Glu Lys Asp Pro Asp Lys Phe Pro Gly He Asp Lys 
20 25 30 

His Met Glu Thr Phe He Gly Ser Asn Asp Phe Ser Asn Val Arg Lys 
35 40 45 

Trp Gly Phe Asp His Val Arg He Pro He Asp Ala Tyr Leu Phe Phe 
50 55 60 



Thr Asp Gin Glu Ala Pro He Glu Asn Arg Leu Val His He Asp Asn 
65 70 75 80 



Ala Val Lys Tyr Ala Arg Ser Asn Gly Leu Lys Val He Leu Asp Leu 
85 90 95 



His Glu Cys Pro Gly His Asp Phe Ser Asp Ala Ala Lys Gly Pro Val 
100 105 110 



Gin Lys Leu Phe Ser Gly Asp Asp Thr Tyr He Lys Lys Thr Glu Lys 
115 120 125 



He Trp Ala Cys Leu Ala Glu Arg Tyr Ser Lys Asn Asp Asn Val Leu 
130 135 140 
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Tyr Glu Thr Leu Asn Glu Pro Val Ala Pro Thr Pro Glu lie Trp Asn 

150 155 160 

Thr Val Lys Asp Arg Leu Cys Arg Glu He Arg Leu His Ala Pro Trp 
165 170 175 

Ala Thr He He Thr Gly Ser Asn Met Trp Asn Trp Pro Ser Thr Phe 
180 185 190 

Asp Ser Leu Thr Pro Phe Asp Asp Asp Asn Val He Tyr Ser Val His 

200 205 

Phe Tyr Glu Pro Leu Leu Phe Thr His Gin Asn Ala Pro Trp He Asn 
210 215 220 

Asn Ser Glu He Arg He Thr Arg Pro Tyr Pro Gly Asp Tyr Glv Pro 

230 235 240 

Gly Phe Val Arg Lys Tyr Gly Leu Thr Leu Ser Ala Gly Val Trp Asn 
245 250 255 

Arg Asp Arg Leu Ala Lys Glu Phe Ala Pro Val Asn Ala Phe Arg Lys 
260 265 270 

Lys Tyr Lys Ala Gin Val He Cys Asp Glu Phe Gly Val Tyr Ala Pro 
275 280 285 

Val Glu He Glu Ser Gin Leu Arg Trp Tyr Glu Asp Leu Leu Ser He 
230 295 300 



Leu Arg Glu Met Gly He Gly Phe Ser Tyr Trp Asn Tyr Lys Asn Leu 

310 315 320 

Asp Phe Gly He He Ser He Gly Glu Lys Leu His Glu Ser Leu Leu 
325 330 335 

Gin Tyr Gly Asn Gly Asp Arg He Asn His Met Val Leu Asp Leu Leu 
340 345 



Lys Lys Tyr 
355 



<210> 153 
<211> 1068 
<212> DNA 
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<220> 

<223> Obtained from environmental sample 
<400> 153 

atgcaaagaa tgcgaggctt aaatattggc ggctggctca gccagattga cgccatacag 60 

gaaaaggacc ctgagggctt tcccggaata gacaaacaca tggaaacatt cattgtttcc 120 

ggagattttt acaatatcag gaaatggggt ttcgaccatg tgcggcttcc cattgactcg 180 

tacctgttct ttacggaaga cgatgccccc attgagaaca ggtttgccca tcttgaccgc 240 

gccgtacaat tcgcgaagag caacagcctc aagctgatat tggacctcca cgagtgtccg 300 

ggacacgatt tttccgaagc cgcgaaagga cccgtccaga aacttttttc gggagatgac 360 

gtttacataa aaaaaaccga gaaaatctgg gcctgcctcg ccgagcgtta ttcgaaaaac 420 

gaccatgtac tctttgagac tctcaacgaa cctgtcgctc ccactgccga aatttggaac 480 

aaggttaagg acaggctctg cagagtaatc cgcatccacg cgccctggtc gaccataatc 540 

accggctcca atatgtggaa ctcgccgtcc gccttcgacg gtcttacgcc ctttgacgat 600 

ggcaacgtga tctacagcgt gcatttttac gagccgctgc tttttacgca tcagaacgcg 660 

ccgtggatcg acaatccgga gatcaggacg gcaaggccct atccgggcga ttacggcccc 720 

ggccttgtcc gcaaatacgg tatggcgcag tcggccggca tctggaacaa gaaacggctt 780 

gcaaaagaat ttgagcccgt ggacgcgttc aggaaaaaat acaaggcgcg cgttatctgt 840 

aacgagtttg gcgtgtacgc ccccgccgat ctggaatcgc agcttcgctg gtatgaggat 900 

ctgctctcaa tcctcaacgg gatgcagata ggttactcgt actggaacta caaaaatctg 960 

gatttcggaa taatttccat aggggagaaa ctgcacgaaa gactttcgca gtatgacaac 1020 

gacgagcgga taaaccaccc ggtgctgaat gtgctgaaga aatattaa 1068 

<210> 154 
<211> 355 
<212> PRT 
<213> Un3cnown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (19)... (325) 

<223> Cellulase (glycosyl hydrolase family 5) 
<220> 

<221> SITE 

<222> (145) . . . (154) 

<223> Glycosyl hydrolases family 5 signature. Prosite id = PS00659 
<400> 154 
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Met Gin Arg Met Arg Gly Leu Asn lie Gly Gly Trp Leu Ser Gin He 
15 10 15 

Asp Ala He Gin Glu Lys Asp Pro Glu Gly Phe Pro Gly He Asp Lys 
20 25 30 

His Met Glu Thr Phe He Val Ser Gly Asp Phe Tyr Asn He Arg Lys 
35 40 45 

Trp Gly Phe Asp His Val Arg Leu Pro He Asp Ser Tyr Leu Phe Phe 
50 55 60 

Thr Glu Asp Asp Ala Pro He Glu Asn Arg Phe Ala His Leu Asp Arg 
65 70 75 80 

Ala Val Gin Phe Ala Lys Ser Asn Ser Leu Lys Leu He Leu Asp Leu 
85 90 95 

His Glu Cys Pro Gly His Asp Phe Ser Glu Ala Ala Lys Gly Pro Val 
100 105 110 

Gin Lys Leu Phe Ser Gly Asp Asp Val Tyr He Lys Lys Thr Glu Lys 
115 120 125 

He Trp Ala Cys Leu Ala Glu Arg Tyr Ser Lys Asn Asp His Val Leu 
130 135 140 

Phe Glu Thr Leu Asn Glu Pro Val Ala Pro Thr Ala Glu He Trp Asn 
145 150 - 155 160 

Lys Val Lys Asp Arg Leu Cys Arg Val He Arg He His Ala Pro Trp 
165 170 175 

Ser Thr He He Thr Gly Ser Asn Met Trp Asn Ser Pro Ser Ala Phe 
180 185 190 

Asp Gly Leu Thr Pro Phe Asp Asp Gly Asn Val He Tyr Ser Val His 
195 200 205 

Phe Tyr Glu Pro Leu Leu Phe Thr His Gin Asn Ala Pro Trp He Asp 
210 215 220 

Asn Pro Glu He Arg Thr Ala Arg Pro Tyr Pro Gly Asp Tyr Gly Pro 
225 230 235 240 



Gly Leu Val Arg Lys Tyr Gly Met Ala Gin Ser Ala Gly He Trp Asn 
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Lys Lys Arg Leu Ala Lys Glu Phe Glu Pro Val Asp Ala Phe Arg Lys 
260 265 270 



Lys Tyr Lys Ala Arg Val lie Cys Asn Glu Phe Gly Val Tyr Ala Pro 
275 280 285 



Ala Asp Leu Glu Ser Gin Leu Arg Trp Tyr Glu Asp Leu Leu Ser lie 
290 295 300 



Leu Asn Gly Met Gin lie Gly Tyr Ser Tyr Trp Asn Tyr Lys Asn Leu 
305 310 . 315 320 



Asp Phe Gly lie lie Ser lie Gly Glu Lys Leu His Glu Arg Leu Ser 
325 330 335 



Gin Tyr Asp Asn Asp Glu Arg lie Asn His Pro Val Leu Asn Val Leu 
340 345 350 



Lys Lys Tyr 
355 



<210> 155 
<211> 954 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 155 



atgttaaagg 


attccggttt 


ttataagggc atcaatctcg gcggctggct 


gtcccagtgc 


60 


gactacagcg 


aggagcgcct 


gaacagcttc atcaccgaaa aggactttga 


gcftgatcgcc 


120 


tcctggggtt 


ttgaccacgt 


ccgcctcccg gtggactata atgtcatcca 


ggatgcggaa 


180 


ggccgcatga 


tggagaaagg 


ccttgcacgc atcgacgccg cgcttcggtt 


ttgtgagaag 


240 


accgggcttc 


acatggttct 


cgacctgcat aagacaccgg gcttttcctt 


cgacccgcag 


300 


gagcaggaga 


tgggattctt 


ccggtcggcg cccgaccagc agctcttcta 


cacgatctgg 


360 


gagagccttg 


ctgcccggta 


tgcagacaaa tcggagatac tcatgttcga 


tcttctgaac 


420 


gagatcacgg 


agccggcgta 


tctggaggac tggaaccgga tttccgcgga 


atgcatccgc 


480 


cgcatccggc 


gtacgatgcc 


ggacgtccga attctggtcg gaagctatca 


ccacaatgcc 


540 


gtcagcgcgg 


taaaggacct 


gcctgcgccg gcagacgata aggtttttta 


cagctttcac 


600 


tgttacgacc 


ctcacaccta 


tacccaccag ggcgcttact ggatgccgga 


tgactttgac 


660 
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atcgatgcaa 


gagtttcctt 


ccgcgacacc 


ggcgttaccc 


ccgtcttctt 


cgaaaagctg 


720 


tttgcctccg 


ccgttgaaaa 


ggcgcaggcg 


gaagggacgg 


aactgtactg 


cggagaatac 


780 


ggcgtcatcg 


acattgttcc 


gccggaggat 


gccgttctct 


ggttccggac 


cattcatgag 


840 


gtctttgaag 


cattcgggat 


tgcaagaagc 


gtctggagct 


ataaggaaat 


ggatttcggt 


900 


ctcgccgacc 


cccgcatgga 


tgcggtccgg 


gcagagctgc 


tgacctgtct 


ctga 


954 



<210> 156 
<211> 317 
<212> PRT 
<213> Unknown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (14) . . . (302) 

<223> Cellulase (glycosyl hydrolase family 5) 
<400> 156 

Met Leu Lys Asp Ser Gly Phe Tyr Lys Gly He Asn Leu Gly Gly Trp 
15 10 15 



Leu Ser Gin Cys Asp Tyr Ser Glu Glu Arg Leu Asn Ser Phe He Thr 
20 25 30 



Glu Lys Asp Phe Glu Val He Ala Ser Trp Gly Phe Asp His Val Arg 
35 40 45 



Leu Pro Val Asp Tyr Asn Val He Gin Asp Ala Glu Gly Arg Met Met 
50 55 60 



Glu Lys Gly Leu Ala Arg He Asp Ala Ala Leu Arg Phe Cys Glu Lys 
65 70 75 80 



Thr Gly Leu His Met Val Leu Asp Leu His Lys Thr Pro Gly Phe Ser 
85 90 95 



Phe Asp Pro Gin Glu Gin Glu Met Gly Phe Phe Arg Ser Ala Pro Asp 
100 105 110 



Gin Gin Leu Phe Tyr Thr He Trp Glu Ser Leu Ala Ala Arg Tyr Ala 
115 120 125 



Asp Lys Ser Glu He Leu Met Phe Asp Leu Leu Asn Glu He Thr Glu 
130 .135 140 
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Pro Ala Tyr Leu Glu Asp Trp Asn Arg lie Ser Ala Glu Cys lie Arg 
145 150 155 160 



Arg lie Arg Arg Thr Met Pro Asp Val Arg lie Leu Val Gly Ser Tyr 
165 170 175 



His His Asn Ala Val Ser Ala Val Lys Asp Leu Pro Ala Pro Ala Asp 
180 185 190 



Asp Lys Val Phe Tyr Ser Phe His Cys Tyr Asp Pro His Thr Tyr Thr 
195 200 205 



His Gin Gly Ala Tyr Trp Met Pro Asp Asp Phe Asp lie Asp Ala Arg 
210 215 220 



Val Ser Phe Arg Asp Thr Gly Val Thr Pro Val Phe Phe Glu Lys Leu 
225 230 235 240 



Phe Ala Ser Ala Val Glu Lys Ala Gin Ala Glu Gly Thr Glu Leu Tyr 
245 . 250 255 



Cys Gly Glu Tyr Gly Val lie Asp lie Val Pro Pro Glu Asp Ala Val 
260 265 270 



Leu Trp Phe Arg Thr lie His Glu Val Phe Glu Ala Phe Gly lie Ala 
275 280 285 



Arg Ser Val Trp Ser Tyr Lys Glu Met Asp Phe Gly Leu Ala Asp Pro 
290 295 300 



Arg Met Asp Ala Val Arg Ala Glu Leu Leu Thr Cys Leu 
305 310 315 



<210> 157 
<211> 954 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 157 

atgttaaagg attccggttt ttataagggc atcaatctcg gcggctggct gtcccagtgc 60 
gactacagcg aggagcgcct gaacagcttc atcaccgaaa aagactttga ggtgatcgcc 120 
tcctggggtt ttgaccacgt ccgtctgccg gtggactata atgtcatcca ggatgcggaa 180 
ggccgcatga tggaggaagg cctcgcacgc atcgacgccg cgcttcggtt ttgtgaaaag 240 
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accgggcttc 


acatggttct 






acttttcctt 


cgacccgcag 


300 




tgcraattctt 






RCTPCTPt* t" eta 

CV y y Vhf W Up* 


cacgafccfcgg 


360 


gagagccttg 


ctgcccggta 


tgcagacaaa 


tcggagatgc 


tcatgttcga 


tcttctgaac 


420 


gagatcacgg 


agccggcgta 


tctgaaggac 


tggaaccgga 


tttccgcgga 


atgcatccgc 


480 


cgcatccggc 


gtacgatgcc 


ggacgtccgg 


attctggtcg 


gaagctatca 


ccacaatgcc 


540 


gtcagcgcgg 


taaaggacct 


gcctgcgccg 


gcggacgacc 


gggtttttta 


cagctttcac 


600 


tgttacgacc 


ctcacaccta 


tacccaccag 


ggcgcttact 


ggatgccgga 


tgactttgac 


660 


atcgatgcaa 


gagtttcctt 


ccgcgacatc 


ggcgtcaccc 


ccgccttctt 


cgaagagctg 


720 


tttgcatctg 


ccgttgaaaa 


ggcgaaggtg 


gaagggacgg 


aactgtactg 


cggagaatac 


780 


ggcgtcatcg 


acattgttcc 


gccggaggat 


gccgttctct 


ggttccggac 


cattcatgag 


840 


gtctttgaga 


aatacgggat 


tgcaagaagc 


gtctggagct 


ataaggaaat 


ggatttcggt 


900 


ctctccgacc 


cccgcatgga 


cgcggtccgg 


gcagagctgc 


tgacctgtct 


ctga 


954 



<210> 158 
<211> 317 
<212> PRT 
<213> Unlcnown 



<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (14) , . . (302) 

<223> Cellulase (glycosyl hydrolase family 5) 
<400> 158 

Met Leu Lys Asp Ser Gly Phe Tyr Lys Gly lie Asn Leu Gly Gly Trp 
15 10 15 

Leu Ser Gin Cys Asp Tyr Ser Glu Glu Arg Leu Asn Ser Phe He Thr 
20 25 30 

Glu Lys Asp Phe Glu Val He Ala Ser Trp Gly Phe Asp His Val Arg 
35 40 45 



Leu Pro Val Asp Tyr Asn Val He Gin Asp Ala Glu Gly Arg Met Met 
50 55 60 



Glu Glu Gly Leu Ala Arg He Asp Ala Ala Leu Arg Phe Cys Glu Lys 
,65 70 75 80 



Thr Gly Leu His Met Val Leu Asp Leu His Lys Thr Pro Gly Phe Ser 
85 90 95 



wo 2006/101584 



231/248 



PCT/US2006/002516 



Phe Asp Pro Gin Glu Gin Glu Met Gly Phe Phe Arg Ser Ala Pro Asp 
100 105 110 



Gin Gin Arg Phe Tyr Thr He Trp Glu Ser Leu Ala Ala Arg Tyr Ala 
115 120 125 



Asp Lys Ser Glu Met Leu Met Phe Asp Leu Leu Asn Glu He Thr Glu 
130 135 140 



Pro Ala Tyr Leu Lys Asp Trp Asn Arg He Ser Ala Glu Cys He Arg 
145 150 155 160 



Arg He Arg Arg Thr Met Pro Asp Val Arg He Leu Val Gly Ser Tyr 
165 170 175 



His His Asn Ala Val Ser Ala Val Lys Asp Leu Pro Ala Pro Ala Asp 
180 185 190 



Asp Arg Val Phe Tyr Ser Phe His Cys Tyr Asp Pro His Thr Tyr Thr 
195 200 205 



His Gin Gly Ala Tyr Trp Met Pro Asp Asp Phe Asp He Asp Ala Arg 
210 215 220 



Val Ser Phe Arg Asp He Gly Val Thr Pro Ala Phe Phe Glu Glu Leu 
225 230 235 240 



Phe Ala Ser Ala Val Glu Lys Ala Lys Val Glu Gly Thr Glu Leu Tyr 
245 250 255 



Cys Gly Glu Tyr Gly Val He Asp He Val Pro Pro Glu Asp Ala Val 
260 265 270 



Leu Trp Phe Arg Thr He His Glu Val Phe Glu Lys Tyr Gly He Ala 
275 280 285 



Arg Ser Val Trp Ser Tyr Lys Glu Met Asp Phe Gly Leu Ser Asp Pro 
290 295 300 



Arg Met Asp Ala Val Arg Ala Glu Leu Leu Thr Cys Leu 
305 310 315 



<210> 159 
<211> 1023 
<212> DNA 
<213> Unknown 
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<220> 

<223> Obtained from environmental sample 
<400> 153 

atgaatccaa cattcagttc cgtaccggca ttaaaggagc tgtttgcggc ggacttcaac 60 

atcggggcgg cggtgaatcc gacgacgatc cggacgcagg aggcgttgct ggcttatcat 120 

tttaacagcc tgactgcgga gaacgagatg aagttcgtca gcgtgcatcc ggaggagcag 180 

acctatacct tcgaggcggc ggaccggctg gtcgaattcg cccgagagca cggcatggcc 240 

atgcggggac acacgctcgt atggcataac cagacgtccg attggctgtt ccaggatcgc 300 

caaggcggga gggtaagcaa ggaggtgctg ctcggaaggc tccgggagca tattcatacc 360 

atagtaggcc ggtacaagaa cgagatctac gcctgggacg tcgtcaacga ggtcatcgcg 420 

gacgaagggg aggcgctgct gcgcacttcc aaatggacgg aaatcgcggg acctgaattt 480 

atcgctaaag cgttcgagta tgcacatgag gcggatccac aggcgctgtt gttttataac 540 

gactacaacg aatcgaatcc tctgaaacgc gataaaattt acacactcgt tcattcgctg 600 

ctggagcaag gggtgccgat ccatggcatc ggattacaag cgcactggaa cctgtacgat 660 

ccatcgttgg atgagattaa ggcagcgatt gagaagtatg cttcgctggg tttgcagctg 720 

cagctgacgg agctggatct ctcgatgttc cgcttcgatg accggcgaac cgatttgacc 780 

gcgccagagc cggggatgct ggagcaacag gccgagcgtt atgaagccgt gttccggctg 840 

ttgctggagt atcgtgacgt catcagcggc gttaccttct ggggagcggc ggatgattat 900 

acctggctgg acaattttcc ggtgcgcggc cggaagaact ggccgtttct gttcgatgcc 960 

cagcaccagc cgaaggcagc ttatcaccgt gtggcggcat tggctgcgga gcaacgagca 1020 
taa 

1023 

<:210> 160 
<211> 340 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 
<222> (10) . (335) 

<223> Glycosyl hydrolase family 10 
<220> 

<221> SITE 

<222> (91) . , . (94) 

<223> N-glycosylation site. Prosite id - PSOOOOl 
<220> 

<221> SITE 
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<222> (185) . . . (188) 

<223> N-glycosylation site. Prosite id = PSOOOOl 
<400> 160 

Met Asn Pro Thr Phe Ser Ser Val Pro Ala Leu Lys Glu Leu Phe Ala 
15 10 15 



Ala Asp Phe Asn lie Gly Ala Ala Val Asn Pro Thr Thr He Arg Thr 
20 25 30 



Gin Glu Ala Leu Leu Ala Tyr His Phe Asn Ser Leu Thr Ala Glu Asn 
35 40 45 



Glu Met Lys Phe Val Ser Val His Pro Glu Glu Gin Thr Tyr Thr Phe 
50 55 60 



Glu Ala Ala Asp Arg Leu Val Glu Phe Ala Arg Glu His Gly Met Ala 
65 70 75 80 



Met Arg Gly iHis Thr Leu Val Trp His Asn Gin Thr Ser Asp Trp Leu 
85 90 95 



Phe Gin Asp Arg Gin Gly Gly Arg Val Ser Lys Glu Val Leu Leu Gly 
100 105 110 



Arg Leu Arg Glu His He His Thr He Val Gly Arg Tyr Lys Asn Glu 
115 120 125 



He Tyr Ala Trp Asp Val Val Asn Glu Val He Ala Asp Glu Gly Glu 
130 135 140 



Ala Leu Leu Arg Thr Ser Lys Trp Thr Glu He Ala Gly Pro Glu Phe 
145 150 155 160 



He Ala Lys Ala Phe Glu Tyr Ala His Glu Ala Asp Pro Gin Ala Leu 
165 170 175 



Leu Phe Tyr Asn Asp Tyr Asn Glu Ser Asn Pro Leu Lys Arg Asp Lys 
180 185 190 



He Tyr Thr Leu Val His Ser Leu Leu Glu Gin Gly Val Pro He His 
195 200 205 



Gly He Gly Leu Gin Ala His Trp Asn Leu Tyr Asp Pro Ser Leu Asp 
210 215 220 



Glu He Lys Ala Ala He Glu Lys Tyr Ala Ser Leu Gly Leu Gin Leu 
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225 230 235 



240 



Gin Leu Thr Glu Leu Asp Leu Ser Met Phe Arg Phe Asp Asp Arg Arg 
245 250 255 

Thr Asp Leu Thr Ala Pro Glu Pro Gly Met Leu Glu Gin Gin Ala Glu 
260 265 270 

Arg Tyr Glu Ala Val Phe Arg Leu Leu Leu Glu Tyr Arg Asp Val He 
275 280 285 

Ser Gly Val Thr Phe Trp Gly Ala Ala Asp Asp Tyr Thr Trp Leu Asp 
290 295 300 

Asn Phe Pro Val Arg Gly Arg Lys Asn Trp Pro Phe Leu Phe Asp Ala 

310 315 320 

Gin His Gin Pro Lys Ala Ala Tyr His Arg Val Ala Ala Leu Ala Ala 
325 330 335 

Glu Gin Arg Ala 
340 

<210> 161 
<211> 2820 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 161 

atgtcctgcc gcaccctgat gagtaggcgt gtaggatggg gacttttatt gtggggaggt 60 

ttattcctca gaaccggttc ggttacagga caaacttaca attatgccga agtcctgcag 12 0 

aaatctatgt ttttctacga atgtcaggag tctaaaattg ccccgggcaa tcgggtgaca 180 

tggcgagcta atgcagccat gaacgatggg agcgatgttg gaaaagacct gacaggagga 240 

tggtttgatg caggtgacca tgtgaaattt aattttccca tggcgtttac cgctacggcg 300 

ctggcgtggg gagctattga ctttgctcag ggatacatta gttccgggca aatgcaatac 360 

ctgaaacgta atctgcgcta cgtcaatgac tatttcatta aatgtcacac agcccccaac 420 

gaattgtatg gtcaggtggg taatggaggc cttgaccatg ccttttgggg accacccgaa 480 

gtcatgcgca tggctaggcc tgcctataaa attgatgcgt caaaacccgg atcagatctg 540 

gctgccgaaa cagctgctgc aatggctgcc gccagcattg ttttcaaatc cgacgatcct 600 

acctatagcg ctactttgct gaatcatgca aaacagctgt tttcttttgc cgaaacctat 660 
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a a. agga a. a a. t 


attccgacgc 


tattaccgat 


gctgcaggat 


attataactc 


ctggaCTcggc 


720 


tataacgatg 


aactggtatg 


gggagctata 


taactttacc 


ggQctaccgg 

Zi ^ ia ^ ^^^^ ^ 3 


cgatgcaacc 


780 


tatctatcta 


aggcagaatc 


c tat tacgac 


aatctgggta 


atcagggtca 


ggaacccgtt 


840 


aaacfcc taca 


aatggaccat 


tcrcataciaat 


gacaaatcct 


atggctgtta 


tgccctactg 


900 


gccaaattga 


caggtaagga 


aaaatacaaa 


attgacgccg 


aacgttttct 


cgactattgg 


960 


accgatggtt 


ataatggttc 


ccggattact 


tataccccgg 


gaggactcgc 


tttcctcgat 


1020 


atatacroaat 


cgttgcgc.ta 


tgctatgaat 


actgcctttg 


ttgctgccta 


ctatgccgat 


1080 


gcagccactt 


cagctgctaa 


aaccacaaaa 


tatctcaact 


ttgctaaaca 


acaactgcat 


1140 


tatgctcttg 


gatccaatcc 


gagcaacaga agctatgtct 


gtggctttgg 


caataatcct 


1200 


cccgttaatc 


ctcaccatag 


aggtgcacac 


ggagcatggt 


ctaataatgt 


tcaaggacct 


1260 


cctaccgaaa 


cacgacatat 


cctctacggc 


gcattagtgg 


gtggaccagg cagtaatgac 


1320 


\-> U> U. Cl. ^ ^'SF 


aCCTacpcratc 

O-Vrf o,y^ vy ^ ' — w 


caattacacc 


aataacgaag 


tagcatgtga 


ctacaatgct 


1380 


c 1 1* i~ t" c It" ccr 

\^ V- ■ ^ Vp^ V-* \-<* 


gac t gc t tgc 


aaagttcgtc 


attgattatg 


gaggcacacc 


gttagccaac 


1440 


ttccctgttc 


gtgaaacccc 


aaaagatgaa 


tatttcgttg 


aagcaaaagc 


aaacgctaca 


1500 


ggaaccaatt 


tctccaaatcr 


gacggtatgg gtatacaacc 


acactgcatg gccagcccgt 


1560 






cagattatac 


gtaaatattt 


cggaaggact 


ggctgcaggc 


1620 


tatactgcct 


caaattatgt 


tgtgcaaacc 


aataatgccg 


gtgtggtaaa 


ctttacccaa 


1680 


cttttagctg 


ctgatgcagc 


taacggcatc 


tattataccg 


aagtaacctt 


taaacctggt 


1740 


accgaaattt 


atcctggcgg 


gcaacagtat 


gacaagaagg 


aagctcagat 


gcgtattagt 


1800 


ttgcccaatg 


ctccggcttc 


tgcatgggat 


ccgactaacg 


acccgtcatg ggcgggaatc 


1860 


acctctacct 


tgaaacaaat 


gccgggtata 


cccatgtatg 


tagatggtgt 


aaaggtattt 


1920 


ggtaatgagc 


ctgtcccagg 


tcagacagtt 


cccgtcaccg 


gagtaaccgt 


atcgcctacc 


1980 


accctgagtc 


tgactgtagg 


acagaccagt 


acactcaccg 


ctaccgtatc 


gccggctaat 


2040 


gctaccaaca 


aaaacgtcac 


ctggagcagc 


agcaatacca 


gcgtagccac 


ggtaagctca 


2100 


acaggcgttg 


tcacagccgt 


agcagccggt 


tcggccacca 


tcaccgtaac 


cacagtcgat 


2160 


ggcgctaaaa 


cagccacctg 


cgccgtaacg gtaacaggca 


gcaccaacgt 


tcccgtcacc 


2220 


ggagtaaccg 


tatcgcccac 


cacgctgagt 


ctgaccgtag 


ggcagaccgc 


taccctcacc 


2280 


gctaccgtat 


cgccggctaa 


tgctaccaac aagaacgtta 


cctggagcag 


cagcaatacc 


2340 


agcgtagcca 


cggtaagttc 


aacaggcgta 


gttactgccg 


tagcggccgg 


ttcggccacc 


2400 


atcaccgtaa 


ccaccgtcga 


tggagctaaa 


accgctacct 


gcaccgtaac 


ggtaacgggc 


2460 


agcactaccg 


tacccgtcac 


cggcgtaact gtatcgccta 


ccaccctgag 


tctgaccgtt 


2520 
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ggacaaaccg ctaccctgac cgctaccgta tcgccagctg atgctaccaa caagaacgtc 2580 

acctggagca gcagcaatac cagcgtagcc acggtaagct caacaggcgt agtcactgcc 2640 

gtagcggccg gttcagctac catcaccgtg accacagtcg atggggctaa aactgctacc 2700 

tgtgccgtga ccgtaaccgc cggaggttcc accaccccct gcagtaatcc ggtaagcaaa 2760 

accctacctc tggtacagga tggtgccggc gaattcaggt tgagtaatag ttttaattaa 2820 



<210> 162 
<211> 939 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> SIGNAL 
<222> (1) , . . (30) 

<220> 

<221> DOMAIN 

<222> (34) , . . (469) 

<223> Glycosyl hydrolase family 9 

<220> 

<221> DOMAIN 

<222> (491) . . . (576) 

<223> Cellulose binding domain 

<220> 

<221> DOMAIN 

<222> (738) . . . (816) 

<223> . Bacterial Ig-like domain (group 2) 
<220> 

<221> DOMAIN 

<222> (825) . , . (903) 

<223> Bacterial Ig-like domain (group 2) 
<220> 

<221> DOMAIN 

<222> (651) . . , (729) 

<223> Bacterial Ig-like domain (group 2) 
<400> 162 

Met Ser Cys Arg Thr Leu Met Ser Arg Arg Val Gly Trp Gly Leu Leu 
15 10 15 



Leu Trp Gly Gly Leu Phe Leu Arg Thr Gly Ser Val Thr Gly Gin Thr 
20 25 30 



Tyr Asn Tyr Ala Glu Val Leu Gin Lys Ser Met Phe Phe Tyr Glu Cys 
35 40 45 
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Gin Glu Ser Lys He Ala Pro Gly Asn Arg Val Thr Trp Arg Ala Asn 
50 55 60 



Ala Ala Met Asn Asp Gly Ser Asp Val Gly Lys Asp Leu Thr Gly Gly 
65 70 75 80 



Trp Phe Asp Ala Gly Asp His Val Lys Phe Asn Phe Pro Met Ala Phe 
85 90 95 



Thr Ala Thr Ala Leu Ala Trp Gly Ala He Asp Phe Ala Gin Gly Tyr 
100 105 110 



He Ser Ser Gly Gin Met Gin Tyr Leu Lys Arg Asn Leu Arg Tyr Val 
115 120 125 



Asn Asp Tyr Phe He Lys Cys His Thr Ala Pro Asn Glu Leu Tyr Gly 
130 135 140 



Gin Val Gly Asn Gly Gly Leu Asp His Ala Phe Trp Gly Pro Pro Glu 
145 150 155 160 



Val Met Arg Met Ala Arg Pro Ala Tyr Lys He Asp Ala Ser Lys Pro 
165 170 175 



Gly Ser Asp Leu Ala Ala Glu Thr Ala Ala Ala Met Ala Ala Ala Ser 
180 185 190 



He Val Phe Lys Ser Asp Asp Pro Thr Tyr Ser Ala Thr Leu Leu Asn 
195 200 205 



His Ala Lys Gin Leu Phe Ser Phe Ala Glu Thr Tyr Lys Gly Lys Tyr 
210 215 220 



Ser Asp Ala He Thr Asp Ala Ala Gly Tyr Tyr Asn Ser Trp Ser Gly 
225 230 235 240 



Tyr Asn Asp Glu Leu Val Trp Gly Ala He Trp Leu Tyr Arg Ala Thr 
245 250 255 



Gly Asp Ala Thr Tyr Leu Ser Lys Ala Glu Ser Tyr Tyr Asp Asn Leu 
260 265 270 



Gly Asn Gin Gly Gin Glu Pro Val Lys Ala Tyr Lys Trp Thr He Ala 
275 280 285 
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Trp Asp Asp Lys Ser Tyr Gly Cys Tyr Ala Leu Leu Ala Lys Leu Thr 
290 295 300 



Gly Lys Glu Lys Tyr Lys lie Asp Ala Glu Arg Phe Leu Asp Tyr Trp 
305 310 315 320 



Thr Asp Gly Tyr Asn Gly Ser Arg He Thr Tyr Thr Pro Gly Gly Leu 
325 330 335 



Ala Phe Leu Asp He Trp Gly Ser Leu Arg Tyr Ala Met Asn Thr Ala 
340 345 350 



Phe Val Ala Ala Tyr Tyr Ala Asp Ala Ala Thr Ser Ala Ala Lys Thr 
355 360 365 



Thr Lys Tyr Leu Asn Phe Ala Lys Gin Gin Leu His Tyr Ala Leu Gly 
370 375 380 



Ser Asn Pro Ser Asn Arg Ser Tyr Val Cys Gly Phe Gly Asn Asn Pro 
385 390 395 400 



Pro Val Asn Pro His His Arg Gly Ala His Gly Ala Trp Ser Asn Asn 
405 410 415 



Val Gin Gly Pro Pro Thr Glu Thr Arg His He Leu Tyr Gly Ala Leu 
420 425 430 



Val Gly Gly Pro Gly Ser Asn Asp Ser Tyr Thr Asp Asp Arg Ser Asn 
435 440 445 



Tyr Thr Asn Asn Glu Val Ala Cys Asp Tyr Asn Ala Leu Phe Ser Gly 
450 455 460 



Leu Leu Ala Lys Phe Val He Asp Tyr Gly Gly Thr Pro Leu Ala Asn 
465 470 475 480 



Phe Pro Val Arg Glu Thr Pro Lys Asp' Glu Tyr Phe Val Glu Ala Lys 
485 490 495 



Ala Asn Ala Thr Gly Thr Asn Phe Ser Glu Trp Thr Val Trp Val Tyr 
500 505 510 



Asn His Thr Ala Trp Pro Ala Arg Glu Gly Ser Glu Tyr Lys Phe Arg 
515 520 525 



Leu Tyr Val Asn He Ser Glu Gly Leu Ala Ala Gly Tyr Thr Ala Ser 
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530 535 540 



Asn Tyr Val Val Gin Thr Asn Asn Ala Gly Val Val Asn Phe Thr Gin 
545 550 555 560 



Leu Leu Ala Ala Asp Ala Ala Asn Gly lie Tyr Tyr Thr Glu Val Thr 
565 570 575 



Phe Lys Pro Gly Thr Glu lie Tyr Pro Gly Gly Gin Gin Tyr Asp Lys 
580 585 590 



Lys Glu Ala Gin Met Arg lie Ser Leu Pro Asn Ala Pro Ala Ser Ala 
595 600 605 



Trp Asp Pro Thr Asn Asp Pro Ser Trp Ala Gly lie Thr Ser Thr Leu 
610 615 620 



Lys Gin Met Pro Gly lie Pro Met Tyr Val Asp Gly Val Lys Val Phe 
625 630 635 640 



Gly Asn Glu Pro Val Pro Gly Gin Thr Val Pro Val Thr Gly Val Thr 
645 650 655 



Val Ser Pro Thr Thr Leu Ser Leu Thr Val Gly Gin Thr Ser Thr Leu 
660 665 670 



Thr Ala Thr Val Ser Pro Ala Asn Ala Thr Asn Lys Asn Val Thr Trp 
675 680 685 



Ser Ser Ser Asn Thr Ser Val Ala Thr Val Ser Ser Thr Gly Val Val 
690 695 700 



Thr Ala Val Ala Ala Gly Ser Ala Thr lie Thr Val Thr Thr Val Asp 
705 710 715 720 



Gly Ala Lys Thr Ala Thr Cys Ala Val Thr Val Thr Gly Ser Thr Asn 
725 730 735 



Val Pro Val Thr Gly Val Thr Val Ser Pro Thr Thr Leu Ser Leu Thr 
740 745 750 



Val Gly Gin Thr Ala Thr Leu Thr Ala Thr Val Ser Pro Ala Asn Ala 
755 760 765 



Thr Asn Lys Asn Val Thr Trp Ser Ser Ser Asn Thr Ser Val Ala Thr. 
770 775 780 
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Val Ser Ser Thr Gly Val Val Thr Ala Val Ala Ala Gly Ser Ala Thr 
785 790 795 800 



He Thr Val Thr Thr Val Asp Gly Ala Lys Thr Ala Thr Cys Thr Val 
805 810 815 



Thr Val Thr Gly Ser Thr Thr Val Pro Val Thr Gly Val Thr Val Ser 
* 820 825 830 



Pro Thr Thr Leu Ser Leu Thr Val Gly Gin Thr Ala Thr Leu Thr Ala 
835 • 840 845 



Thr Val Ser Pro Ala Asp Ala Thr Asn Lys Asn Val Thr Trp Ser' Ser 
850 855 860 



Ser Asn Thr Ser Val Ala Thr Val Ser Ser Thr Gly Val Val Thr Ala 
865 870 875 880 



Val Ala Ala Gly Ser Ala Thr He Thr Val Thr Thr Val Asp Gly Ala 
885 890 895 



Lys Thr Ala Thr Cys Ala Val Thr Val Thr Ala Gly Gly Ser Thr Thr 
900 905 910 



Pro Cys Ser Asn Pro Val Ser Lys Thr Leu Pro Leu Val Gin Asp Gly 
915 920 925 



Ala Gly Glu Phe Arg Leu Ser Asn Ser Phe Asn 
930 935 

<210> 163 
<211> 2733 
<212> DNA 
<213> Un3cnown 

<220> 

<223> Obtained from environmental sample 
<400> 163 

atgcaaactt acaattatgc cgaagtcctg cagaaatcta tgtttttcta cgaatgtcag 60 

gagtctaaaa ttgccccggg caatcgggtg acatggcgag ctaatgcagc catgaacgat 120 

gggagcgatg ttggaaaaga cctgacagga ggatggtttg atgcaggtga ccatgtgaaa 180 

tttaattttc ccatggcgtt taccgctacg gcgctggcgt ggggagctat tgactttgct 240 

cagggataca ttagttccgg gcaaatgcaa tacctgaaac gtaatctgcg ctacgtcaat 300 

gactatttca ttaaatgtca cacagccccc aacgaattgt atggtcaggt gggtaatgga 360 
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ggccttgacc 


atgccttttg 


gggaccaccc 


gaagtcatgc 


gcatggctag gcctgcctat 


420 


aaaattgatg 


cgtcaaaacc 


cggatcagat 


ctggctgccg 


aaacagctgc 


tgcaatggct 


480 


gccgccagca 


ttgttttcaa 


atccgacgat 


cctacctata gcgctacttt 


gctgaatcat 


540 


gcaaaacagc 


tgttttcttt 


tgccgaaacc 


tataaaggaa 


aatattccga 


cgctattacc 


600 


gatgctgcag 


gatattataa 


ctcctggagc 


ggctataacg atgaactggt 


atggggagct 


660 


atatggcttt 


accgggctac 


cggcgatgca 


acctatctat 


Gtaaggcaga 


atcctattac 


720 


gacaatctgg 


gtaatcaggg 


tcaggaaccc 


gttaaagcct acaaatggac 


cattgcatgg 


780 


gatgacaaat 


cctatggctg 


ttatgcccta 


ctggccaaat 


tgacaggtaa 


ggaaaaatac 


840 


aaaattgacg 


ccgaacgttt 


tctcgactat 


tggaccgatg gttataatgg 


ttcccggatt 


900 


acttataccc 


cgggaggact 


cgctttcctc 


gatatatggg gatcgttgcg ctatgctatg 


960 


aatactgcct 


ttgttgctgc 


ctactatgcc 


gatgcagcca 


cttcagctgc 


taaaaccaca 


1020 


aaatatctca 


actttgctaa 


acaacaactg 


cattatgctc 


ttggatccaa 


tccgagcaac 


1080 


agaagctatg 


tctgtggctt 


tggcaataat 


cctcccgtta 


atcctcacca 


tagaggtgca 


1140 


cacggagcat 


ggtctaataa 


tgttcaagga 


cctcctaccg 


aaacacgaca 


tatcctctac 


1200 


ggcgcattag 


tgggtggacc 


aggcagtaat 


gactcctata 


ctgacgaccg 


atccaattac 


1*260 


accaataacg 


aagtagcatg 


tgactacaat 


gctcttttct 


ccggactgct 


tgcaaagttc 


1320 


gtcattgatt 


atggaggcac 


accgttagcc 


aacttccctg 


ttcgtgaaac 


cccaaaagat 


1380 


gaatatttcg 


ttgaagcaaa 


agcaaacgct 


acaggaacca atttctccga afcggacggta 


1440 


tgggtataca 


accacactgc 


atggccagcc 


cgtgaaggtt 


ctgaatataa 


attcagatta 


1500 


tacgtaaata 


tttcggaagg 


actggctgca 


ggctatactg 


cctcaaatta 


tgttgtgcaa 


1560 


accaataatg 


ccggtgtggt 


aaactthacc 


caacttttag 


ctgctgatgc 


agctaacggc 


1620 


atctattata 


ccgaagtaac 


ctttaaacct 


ggtaccgaaa 


tttatcctgg cgggcaacag 


1680 


tatgacaaga 


aggaagctca 


gatgcgtatt 


agtttgccca 


atgctccggc 


ttctgcatgg 


1740 


gatccgacta 


acgacccgtc 


atgggcggga 


atcacctcta 


ccttgaaaca 


aatgccgggt 


1800 


atacccatgt 


a t cr t a era t or cr 


tgtaaaggba 


tttggtaatg 


agcctgtccc 


aggtcagaca 


1860 


gttcccgtca 


ccggagtaac 


cgtatcgcct 


accaccctga 


gtctgactgt 


aggacagacc 


1920 


agtacactca 


ccgctaccgt 


atcgccggct 


aatgctacca 


acaaaaacgt 


cacctggagc 


1980 


agcagcaata 


ccagcgtagc 


cacggtaagc 


tcaacaggcg 


ttgtcacagc 


cgtagcagcc 


2040 


ggttcggcca 


ccatcaccgt 


aaccacagtc 


gatggcgcta 


aaacagccac 


ctgcgccgta 


2100 


acggtaacag 


gcagcaccaa 


cgttcccgtc 


accggagtaa 


ccgtatcgcc 


caccacgctg 


2160 
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agtctgaccg 


tagggcagac 


cgctaccctc 


accgctaccg 


tatcgccggc 


taatgctacc 


2220 


aacaagaacg 


ttacctggag 


cagcagcaat 


accagcgtag 


ccacggtaag 


ttcaacaggc 


2280 


gtagttactg 


ccgtagcggc 


cggttcggcc 


accatcaccg 


taaccaccgt 


cgatggagct 


2340 


aaaaccgcta 


cctgcaccgt 


aacggtaacg 


ggcagcacta 


ccgtacccgt 


caccggcgta 


2400 


actgtatcgc 


ctaccaccct 


gagtctgacc 


gttggacaaa 


ccgctaccct 


gaccgctacc 


2460 


gtatcgccag 


ctgatgctac 


caacaagaac 


gtcacctgga 


gcagcagcaa 


taccagcgta 


2520 


gccacggtaa 


gctcaacagg 


cgtagtcact 


gccgtagcgg 


ccggttcagc 


taccatcacc 


2580 


gtgaccacag 


tcgatggggc 


taaaactgct 


acctgtgccg 


tgaccgtaac 


cgccggaggt 


2640 


tccaccaccc 


cctgcagtaa 


tccggtaagc 


aaaaccctac 


ctctggtaca 


ggatggtgcc 


2700 


ggcgaattca 


ggttgagtaa 


tagttttaat 


taa 






2733 



<210> 164 
<211> 910 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<220> 

<221> DOMAIN 

<222> (5) . . . (440) 

<223> Glycosyl hydrolase family 9 

<220> 

<221> DOMAIN 

<222> (462) . . . (547) 

<223> Cellulose binding domain 

<220> 

<221> DOMAIN 

<222> (709) . . . (787) 

<223> Bacterial Ig-like domain (group 2) 
<220> 

<221> DOMAIN 

<222> (796) . . . (874) 

<223> Bacterial Ig-like domain (group 2) 
<220> 

<221> DOMAIN 

<222> (622) . . . (700) 

<223> Bacterial Ig-like domain (group 2) 
<400> 164 

Met Gin Thr Tyr Asn Tyr Ala Glu Val Leu Gin Lys Ser Met Phe Phe 
15 10 15 



Tyr Glu Cys Gin Glu Ser Lys He Ala Pro Gly Asn Arg Val Thr Trp 
20 25 30 
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Arg Ala Asn Ala Ala Met Asn Asp Gly Ser Asp Val Gly Lys Asp Leu 
35 40 45 

Thr Gly Gly Trp Phe Asp Ala Gly Asp His Val Lys Phe Asn Phe Pro 
50 55 60 

Met Ala Phe Thr Ala Thr Ala Leu Ala Trp Gly Ala lie Asp Phe Ala 
65 70 75 80 

Gin Gly Tyr He Ser Ser Gly Gin Met Gin Tyr Leu Lys Arg Asn Leu 
85 90 95 

Arg Tyr Val Asn Asp Tyr Phe He Lys Cys His Thr Ala Pro Asn Glu 
100 105 110 

Leu Tyr Gly Gin Val Gly Asn Gly Gly Leu Asp His Ala Phe Trp Gly 
115 120 125 

Pro Pro Glu Val Met Arg Met Ala Arg Pro Ala Tyr Lys He Asp Ala 
130 135 140 

Ser Lys Pro Gly Ser Asp Leu Ala Ala Glu Thr Ala Ala Ala Met Ala 
145 150 155 160 

Ala Ala Ser He Val Phe Lys Ser Asp Asp Pro Thr Tyr Ser Ala Thr 
165 170 175 

Leu Leu Asn His Ala Lys Gin Leu Phe Ser Phe Ala Glu Thr Tyr Lys 
180 185 190 

Gly Lys Tyr Ser Asp Ala He Thr Asp Ala Ala Gly Tyr Tyr Asn Ser 
195 200 205 

Trp Ser Gly Tyr Asn Asp Glu Leu Val Trp Gly Ala He Trp Leu Tyr 
210 215 220 

Arg Ala Thr Gly Asp Ala Thr Tyr Leu Ser Lys Ala Glu Ser Tyr Tyr 
225 230 235 240 

Asp Asn Leu Gly Asn Gin Gly Gin Glu Pro Val Lys Ala Tyr Lys Trp 
245 250 255 

Thr He Ala Trp Asp Asp Lys Ser Tyr Gly Cys Tyr Ala Leu Leu Ala 
260 265 270 

Lys Leu Thr Gly Lys Glu Lys Tyr Lys He Asp Ala Glu Arg Phe Leu 
275 280 285 

Asp Tyr Trp Thr Asp Gly Tyr Asn Gly Ser Arg He Thr Tyr Thr Pro 
290 295 300 

Gly Gly Leu Ala Phe Leu Asp He Trp Gly Ser Leu Arg Tyr Ala Met 
305 310 315 320 

Asn Thr Ala Phe Val Ala Ala Tyr Tyr Ala Asp Ala Ala Thr Ser Ala 
325 330 335 

Ala Lys Thr Thr Lys Tyr Leu Asn Phe Ala Lys Gin Gin Leu His Tyr 
340 345 350 

Ala Leu Gly Ser Asn Pro Ser Asn Arg Ser Tyr Val Cys Gly Phe Gly 
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355 360 365 

Asn Asn Pro Pro Val Asn Pro His His Arg Gly Ala His Gly Ala Trp 
370 375 380 

Ser Asn Asn Val Gin Gly Pro Pro Thr Glu Thr Arg His lie Leu Tyr 
385 390 395 400 

Gly Ala Leu Val Gly Gly Pro Gly Ser Asn Asp Ser Tyr Thr Asp Asp 
405 410 415 

Arg Ser Asn Tyr Thr Asn Asn Glu Val Ala Cys Asp Tyr Asn Ala Leu 
420 425 430 

Phe Ser Gly Leu Leu Ala Lys Phe Val lie Asp Tyr Gly Gly Thr Pro 
435 440 445 

Leu Ala Asn Phe Pro Val Arg Glu Thr Pro Lys Asp Glu Tyr Phe Val 
450 455 460 

Glu Ala Lys Ala Asn Ala Thr Gly Thr Asn Phe Ser Glu Trp Thr Val 
465 470 475 480 

Trp Val Tyr Asn His Thr Ala Trp Pro Ala Arg Glu Gly Ser Glu Tyr 
485 490 495 

Lys Phe Arg Leu Tyr Val Asn lie Ser Glu Gly Leu Ala Ala Gly Tyr 
500 505 510 

Thr Ala Ser Asn Tyr Val Val Gin Thr Asn Asn Ala Gly Val Val Asn 
515 520 525 

Phe Thr Gin Leu Leu Ala Ala Asp Ala Ala Asn Gly lie Tyr Tyr Thr 
530 535 540 

Glu Val Thr Phe Lys Pro Gly Thr Glu lie Tyr Pro Gly Gly Gin Gin 
545 550 555 560 

Tyr Asp Lys Lys Glu Ala Gin Met Arg He Ser Leu Pro Asn Ala Pro 
565 570 575 

Ala Ser Ala Trp Asp Pro Thr Asn Asp Pro Ser Trp Ala Gly He Thr 
580 585 590 

Ser Thr Leu Lys Gin Met Pro Gly He Pro Met Tyr Val Asp Gly Val 
595 600 605 

Lys Val Phe Gly Asn Glu Pro Val Pro Gly Gin Thr Val Pro Val Thr 
610 615 620 

Gly Val Thr Val Ser Pro Thr Thr Leu Ser Leu Thr Val Gly Gin Thr 
625 630 635 640 

Ser Thr Leu Thr Ala Thr Val Ser Pro Ala Asn Ala Thr Asn Lys Asn 
645 650 655 

Val Thr Trp Ser Ser Ser Asn Thr Ser Val Ala Thr Val Ser Ser Thr 
660 665 670 

Gly Val Val Thr Ala Val Ala Ala Gly Ser Ala Thr He Thr Val Thr 
675 680 685 
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Thr Val Asp Gly Ala Lys Thr Ala Thr Cys Ala Val Thr Val Thr Gly 
690 695 700 

Ser Thr Asn Val Pro Val Thr Gly Val Thr Val Ser Pro Thr Thr Leu 
705 710 715 720 

Ser Leu Thr Val Gly Gin Thr Ala Thr Leu Thr Ala Thr Val Ser Pro 
725 730 735 

Ala Asn Ala Thr Asn Lys Asn Val Thr Trp Ser Ser Ser Asn Thr Ser 
740 745 750 

Val Ala Thr Val Ser Ser Thr Gly Val Val Thr Ala Val Ala Ala Gly 
755 760 765 

Ser Ala Thr He Thr Val Thr Thr Val Asp Gly Ala Lys Thr Ala Thr 
770 775 780 

Cys Thr Val Thr Val Thr Gly Ser Thr Thr Val Pro Val Thr Gly Val 
785 790 795 800 

Thr Val Ser Pro Thr Thr Leu Ser Leu Thr Val Gly Gin Thr Ala Thr 
805 810 815 

Leu Thr Ala Thr Val Ser Pro Ala Asp Ala Thr Asn Lys Asn Val Thr 
820 825 830 

Trp Ser Ser Ser Asn Thr Ser Val Ala Thr Val Ser Ser Thr Gly Val 
835 840 845 

Val Thr Ala Val Ala Ala Gly Ser Ala Thr He Thr Val Thr Thr Val 
850 855 860 

Asp Gly Ala Lys Thr Ala Thr Cys Ala Val Thr Val Thr Ala Gly Gly 
865 870 875 880 

Ser Thr Thr Pro Cys Ser Asn Pro Val Ser Lys Thr Leu Pro Leu Val 
885 890 895 

Gin Asp Gly Ala Gly Glu Phe Arg Leu Ser Asn Ser Phe Asn 
900 905 910 

<210> 165 
<211> 1347 
<212> DNA 
<213> Unknown 

<220> 

<223> Obtained from environmental sample 
<400> 165 

atgacaatta acaacaaaac tacagcgagt cctagtattc ccagcaccca caattccctc 60 
ccgtcgcttc gcacactgtt taccaccagc ctgctcacgc tggccctgac cgcctgcggt 120 
ggttcttcca gcagcgacaa ggacccttca agctccagct ccagtgaatc atcaagttcc 180 
agcgaatcct cgagctcagc ttccagcgaa tcctcgagca gtgagtccag cagtagctct 240 
tccgcgggcc atttctccat cgagccggac ttccagctct acagcctggc caacttcccg 300 
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gtgggcgtgg 


cggtctccgc 


cgccaacgag 


aacgacagca 


tcttcaacag 


tccggatgcc 


360 


gccgaacgtc 


aggccgttat 


tattgagcac 


ttctctcagc 


tcaccgccgg 


caacatcatg 


420 


aaaatgagct 


acctgcagcc 


gagtcaaggc 


aacttcacct 


tcgatgacgc 


cgacgagttg 


480 


gttaacttcg 


cccaagccaa 


tggcatgacc 


gtacacggcc 


actccaccat 


ctggcacgcg 


540 


gactaccaag 


taccgaactt 


catgagaaac 


tttgaaggtg accaggagga atgggcagaa 


600 


attctgaccg 


atcacgtcac 


taccatcatc 


gagcacttcc 


ccgacgatgt 


ggtcatcagc 


660 


tgggacgtgg 


tgaacgaggc 


tgtcgatcaa 


ggcacggcga 


acggctggcg 


ccattcggtg 


720 


ttctacaatg 


cattcgacgc 


cccggaagaa 


ggcgacattc 


ccgaatacat 


caaagtcgct 


780 


ttccgcgccg 


cgcgcgaggc 


tgacgccaac 


gtagacctct 


actacaacga 


ctacgacaat 


840 


accgccaatg 


cccagcgcct 


ggccaaaaca 


ctgcaaattg 


ccgaggtact 


ggacgccgaa 


900 


ggcaccattg 


acggcgtcgg 


tttccagatg 


cacgcctaca 


tggattaccc 


gagcctgacG 


960 


cattttgaaa 


acgccttccg 


gcaagtcgtc 


gacctggggc 


tcaaagtgaa 


agttaccgag 


1020 


ctggacgtat 


ccgtagtcaa 


cccctacggc 


ggcgaagcac 


ctccacaacc 


ggaatacgac 


1080 


aaagaactgg 


ccggcgcgca 


aaaactgcgc 


ttctgccaaa 


tcgccgaagt 


ttacatgaac 


1140 


actgtacccg 


aggagttacg 


cggtggcttc 


accgtctggg 


gcctgaccga 


tgatgaaagt 


1200 


tggctgatgc 


aacagttcag 


aaacgccacc 


ggcgccgact 


acgacgacgt 


ctggccgtta 


1260 


ctgttcaatg 


ccgacaaatc 


cgccaaaccg 


gcactgcaag 


gcgtggccga 


cgccttt.acc 


1320 


ggacaaacct 


gcacctccga 


gttctaa 








1347 



<210> 166 
<211> 448 
<212> PRT 
<213> Unknown 

<220> 

<223> Obtained from environinental sample 
<220> 

<221> SIGNAL 
<222> (1) . . . (45) 

<400> 166 

Met Thr lie Asn Asn Lys Thr Thr Ala Ser Pro Ser lie Pro Ser Thr 
1 5 10 15 

His Asn Ser Leu Pro Ser Leu Arg Thr Leu Phe Thr Thr Ser Leu Leu 
20 25 30 

Thr Leu Ala Leu Thr Ala Cys Gly Gly Ser Ser Ser Ser Asp Lys Asp 
35 40 45 

Pro Ser Ser Ser Ser Ser Ser Glu Ser Ser Ser Ser Ser Glu Ser Ser 
50 55 60 
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Ser Ser Ala Ser Ser Glu Ser Ser Ser Ser Glu Ser Ser Ser Ser Ser 
65 70 75 80 

Ser Ala Gly His Phe Ser He Glu Pro Asp Phe Gin Leu Tyr Ser Leu 
85 90 95 

Ala Asn Phe Pro Val Gly Val Ala Val Ser Ala Ala Asn Glu Asn Asp 
XOO 105 110 

Ser lie Phe Asn Ser Pro Asp Ala Ala Glu Arg Gin Ala Val He He 
115 120 125 

Glu His Phe Ser Gin Leu Thr Ala Gly Asn He Met Lys Met Ser Tyr 
130 135 140 

Leu Gin Pro Ser Gin Gly Asn Phe Thr Phe Asp Asp Ala Asp Glu Leu 
145 ' 150 155 160 

Val Asn Phe Ala Gin Ala Asn Gly Met Thr Val His Gly His Ser Thr 
165 170 175 

He Trp His Ala Asp Tyr Gin Val Pro Asn Phe Met Arg Asn Phe Glu 
180 185 190 

Gly Asp Gin Glu Glu Trp Ala Glu He Leu Thr Asp His Val Thr Thr 
195 200 205. 

He He Glu His Phe Pro Asp Asp Val Val He Ser Trp Asp Val Val 
210 215 220 

Asn Glu Ala Val Asp Gin Gly Thr Ala Asn Gly Trp Arg His Ser Val 
225 230 235 240 

Phe Tyr Asn Ala Phe Asp Ala Pro Glu Glu Gly Asp He Pro Glu Tyr 
245 250 255 

He Lys Val Ala Phe- Arg Ala Ala Arg Glu Ala Asp Ala Asn Val Asp 
260 265 270 

Leu Tyr Tyr Asn Asp Tyr Asp Asn Thr Ala Asn Ala Gin Arg Leu Ala 
275 280 285 

Lys Thr Leu Gin He Ala Glu Val Leu Asp Ala Glu Gly Thr He Asp 
290 295 300 

Gly Val Gly Phe Gin Met His Ala Tyr Met Asp Tyr Pro Ser Leu Thr 
305 310 315 320 

His Phe Glu Asn Ala Phe Arg Gin Val Val Asp Leu Gly Leu Lys Val 
325 330 335 

Lys Val Thr Glu Leu Asp Val Ser Val Val Asn Pro Tyr Gly Gly Glu 
340 345 350 

Ala Pro Pro Gin Pro Glu Tyr Asp Lys Glu Leu Ala Gly Ala Gin Lys 
355 360 365 

Leu Arg Phe Cys Gin He Ala Glu Val Tyr Met Asn Thr Val Pro Glu 
370 375 380 
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Glu Leu Arg Gly Gly Phe Thr Val 
385 390 

Trp Leu Met Gin Gin Phe Arg Asn 
405 

Val Trp Pro Leu Leu Phe Asn Ala 
420 

Gin Gly Val Ala Asp Ala Phe Thr 
435 440 
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Trp Gly Leu Thr Asp Asp Glu 
395 

Ala Thr Gly Ala Asp Tyr Asp 
410 415 

Asp Lys Ser Ala Lys Pro Ala 
425 430 

Gly Gin Thr Cys Thr Ser Glu 
445 
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Ser 
400 

Asp 
Leu 
Phe 



